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Preface 


This the first draft of the Lecture Notes for Mathematical Techniques III (PHY 
317), a course offered in the Physics Department of Queen Mary and West- 
field College (University of London). These notes are loosely based on pre- 
existing notes by Professor John Charap. The notes contain all that is said 
in Lecture and sometimes more. The extra bits are typeset in smaller font 
and are adorned with one or two “dangerous bend” signs as in the next 
paragraphs. 


© Most paragraphs like this fill gaps in the main presentation (e.g., proofs, mathematical 


remarks,...). They contain material which, although necessary for the logical coherence of 
the presentation, may be skipped at a first reading or ignored by the less mathematically 
inclined student who is not interested in proofs,.... They are not an essential part of the 
course, although I believe they are an essential part of the topic. 


of the lectures, but which I personally find interesting and have found useful at one time 


© © Most paragraphs like this contain material which is generally more advanced than the rest 


or other. They are not an essential part of the course, but I have included them in the 
hope that some of you might find them interesting enough to make the detour. 


Some remarks about notation. Terms which are being defined for the 
first time appear in bold sans-serif type. Although the notation will be 
introduced as we go, here is a summary of the main notational conventions: 


R and C stand for the sets of real and complex numbers, respectively; 


vector spaces, subspaces,... are denoted by so-called “blackboard bold” 
uppercase Latin letters: V, W,...; 


abstract vectors are denoted by bold lowercase Latin letters: v, w,...; 


linear maps are denoted by uppercase Latin letters A, B,..., except 
for the identity map which is denoted 1. 


column vectors are denoted by sans-serif lowercase Latin letters: v, 
W,...; 


e matrices are denoted by sans-serif uppercase Latin letters: A, B,.... 
The identity matrix will be denoted I. 


The notes are not yet complete: in particular many of the asides are still 
to be completed, and the introductions have to be rewritten in light of what 
they are meant to introduce: they were written in advance in most cases. 
Many diagrams are missing, and many more examples and applications need 
to be added. The next stage in the development of the notes will consist 
in some changes in the visual layout, to break the monotony of the present 
style, and to make the exercises and the problems an integral part of the 
notes. The solutions, of course, will be available separately. 
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Chapter 1 


Linear Algebra 


In this part of the course we will review some basic linear algebra. The 
topics covered include: real and complex vector spaces and linear maps, 
bases, matrices, inner products, eigenvalues and eigenvectors. We start from 
the familiar setting in two dimensions and introduce the necessary formalism 
to be able to work with vectors in an arbitrary number of dimensions. We 
end the chapter with a physical application: the study of normal modes of 
an oscillatory system. 


1.1 Vector spaces 


Physics requires both scalar quantities like mass, temperature, charge which 
are uniquely specified by its magnitude in some units, e.g., 300°K, 7 Kg.... 
and also vectorial quantities like velocity, force, angular momentum, which 
are specified both by a magnitude and a direction. 

In the first part of the course we will study the general features shared 
by these vectorial quantities. As this is a course in mathematical techniques, 
we must abstract what these quantities have in common (the ‘mathematical’ 
part) while at the same time keeping a pragmatic perspective throughout 
(the ‘techniques’ part). This is not a mathematics course, but nevertheless a 
certain amount of formalism is needed. Some of you may not have seen formal 
definitions before, so we will start by motivating the notion of a vector space. 
For definiteness we will consider displacements in two dimensions; that is, in 
the plane. 


1.1.1 Displacements in the plane 


Every displacement in the plane has an initial or starting point and a final 
point. We will only consider displacements which have a common starting 
point: the origin. 

Any point in the plane is then understood as the final 

u point of a displacement from the origin. We will depict such 

A displacements by an arrow starting at the origin and ending 

at the final point. We will denote such displacements by 

origin boldfaced letters, like u, v. In lecture it is hard to write in 

boldface, so we use the notation w, Y which is not just easier 

to write but has the added benefit of being mnemonic, since 

the arrow reminds us that it is a displacement. We will say that displacements 
like u, v are vectors. 

What can one do with vectors? 

For example, vectors can be multiplied by real numbers 
(the scalars). If A > 0 is a positive real number and v is a 
vector, then Av is a vector pointing in the same direction 
as v but A times as long as v, e.g., 2v is twice as long as Pe 
v but points in the same direction. In the same manner, 4% 

—v is a vector pointing in the direction opposite to v but 

À times as long as v. We call this operation scalar mul- 

tiplication. This operation satisfies two properties which are plain to see 
from the pictures. The first says that if v is any vector and À and wp are real 
numbers, then A (uv) = (Au) v. The second property is totally obvious from 
the picture: 1v = v. 

You should also be familiar from the study of, say, forces, with the fact 
that vectors can be added. 

Indeed, if u and v are vectors, then their sum u + v is 
the diagonal from the origin to the opposite vertex in the 
parallelogram defined by u and v, as in the picture. This 
operation is called vector addition or simply addition. It 
follows from the picture that u +v = v + u, so that we get 
the same result regardless of the order in which we add the 
vectors. One says that vector addition is commutative. 

Vector addition is also associative. This means that, as 
can be seen in the picture, when adding three vectors u, v, 
and w it does not matter whether we first add u and v and 
add w to the result: (u + v) + w or whether we first add 
v and w and add the result to u: u + (v + w). 


v 


Another easy property of vector addition is the existence of a vector 0 
such that when added to any vector v gives back v again; that is, 


0 +v = v for all vectors v. 


Clearly the zero vector 0 corresponds to the trivial displacement which starts 
and ends at the origin, or in other words, to no displacement at all. 
Similarly, given any vector v there is a vector —v which obeys v+(—v) = 
0. We will often employ the notation u — v to denote u + (—v). 
Finally, notice that scalar multiplication and addition are compatible: 
scalar multiplication and addition can be performed in any order: 


A(utv)=Autrv and (At p)v=Avtypuv. 


The former identity says that scalar multiplication is distributive over vector 
addition. Notice that, in particular, it follows that 0 v = 0 for all v. 


1.1.2 Displacements in the plane (revisited) 


There is no conceptual reason why one should not consider displacements 
in space, i.e., in three dimensions, as opposed to the plane. The pictures 
get a little harder to draw, but in principle it can still be done with better 
draughtsmanship than mine. In physics, though, one needs to work with 
vectors in more than three dimensions—in fact, as in Quantum Mechanics, 
one often needs to work with vectors in an infinite number of dimensions. 
Pictures like the ones above then become of no use, and one needs to develop 
a notation we can calculate with. 

Let us consider again the displacements in the plane, but this time with 
a more algebraic notation. 

The first thing we do is to draw two cartesian axes cen- 
tred at the origin: axis 1 and axis 2. Then every displace- 
ment v from the origin can be written as an ordered pair 
(v1, U2) of real numbers, corresponding to the components 
of the displacement v along the cartesian axes, as in the 
figure. 

Let us define the set 


R? = {(v1,v2) | v; E€ R for i = 1,2} 


of ordered pairs of real numbers. 
The above notation may need some explaining. The notation ‘v; € R’ is 
simply shorthand for the phrase ‘v; is a real number;’ whereas the notation 


‘{(v1, V2) | v; € R for i = 1,2}’ is shorthand for the phrase ‘the set consisting 
of pairs (v1, v2) such that both vı and vz are real numbers.’ 

The set R? is in one-to-one correspondence with the set of displacements, 
for clearly every displacement gives rise to one such pair and every such pair 
gives rise to a displacement. We can therefore try to guess how to define the 
operations of vector addition and scalar multiplication in R? in such a way 
that they correspond to the way they are defined for displacements. 

From the pictures defining addition and scalar multiplication, one sees 
that if A € R is a real number, then 


A (U1, V2) = (A v1, A v2) , (scalar multiplication) 


and also 
(u1, u2) + (v1, v2) = (u1 + v1, U2 + v2) . (addition) 

The zero vector corresponds with no displacement at all, hence it is given 
by the pair corresponding to the origin (0,0). It follows from the addition 
rule that 

(0,0) + (v1, V2) = (v1, V2) $ 

Similarly, — (vı, v2) = (—v1, —v2). In fact it is not hard to show (do it!) that 
addition and scalar multiplication obey the same properties as they did for 
displacements. 

The good thing about this notation is that there is no reason why we 
should restrict ourselves to pairs. Indeed, why not consider the set 


RY = { (v1, v2, ,uw) |v: € R for i = 1,2,..., N}, 
of ordered N-tuples of real numbers? We can define addition and scalar 
multiplication in the same way as above: 
(addition) 


(u1, u2, oo UN) + (v1, Va, - a UN) 


= (Uy + U1, U2 + v2,..., UN + UN), 


(multiplication by scalars) 


A (v1, V2,---,UN) = (AU, AV2,...,AUN) for AER. 


In the homework you are asked to prove that these operations on RY obey the 
same properties that displacements do: commutativity, associativity, distrib- 
utivity,... These properties can be formalised in the concept of an abstract 
vector space. 


1.1.3 Abstract vector spaces 


We are finally ready to formalise the observations made above into the de- 
finition of an abstract vector space. We say that this is an abstract vector 
space, because it does not refer to any concrete example. 

A real vector space consists of the following data: 


e Two sets: 
— the set of vectors, which we shall denote V, and whose elements 
we will write as u, v, w, ..., and 


— the set of scalars, which for a real vector space is simply the set 
R of real numbers. We will use lowercase Greek letters from the 
middle of the alphabet: A, u, ...to represent real numbers. 


e Two operations: 


— Scalar multiplication, which takes a scalar À and a vector v and 
produces another vector Av. One often abbreviates this as 


scalar multiplication : R x Y — Y 
Av) = àv. 


— Vector addition, which takes two vectors u and v and produces a 
third vector denoted u + v. Again one can abbreviate this as 


vector addition: Y x Y — Y 


(u,v) > u++v. 


e Eight properties (or axioms): 


V1 (associativity) (u + v) + w = u + (v + w) for all u, v and w; 
V2 (commutativity) u + v = v + u for all u and v; 

V3 There exists a zero vector 0 which obeys 0 + v = v for all v; 

V4 For any given v, there exists a vector —v such that v + (—v) = 0; 
V5 A (uv) = (Ap) v for all v, A and u; 

V6 1v =v for all v; 

V7 (A+ u)v = àv + pv for all À and p and v; 

V8 (distributivity) A (u + v) = Au + àv for all À, u and v. 


This formidable looking definition might at first seem to be something you 
had rather forget about. Actually you will see that after using it in practice 
it will become if not intuitive at least more sensible. Formal definitions like 
this one above are meant to capture the essence of what is being defined. 
Every vector space is an instance of an abstract vector space, and it will 
inherit all the properties of an abstract vector space. In other words, we can 
be sure that any result that we obtain for an abstract vector space will also 
hold for any concrete example. 

A typical use of the definition is recognising vector spaces. To go about 
this one has to identify the sets of vectors and scalars, and the operations of 
scalar multiplication and vector addition and then check that all eight axioms 
are satisfied. In the homework I ask you to do this for two very different 
looking spaces: RY which we have already met, and the set consisting of 
real-valued functions on the interval |[—1,1]. In the course of these lectures 
we will see many others. 


You may wonder whether all eight axioms are necessary. For example, you may question 
the necessity of V4, given V3. Consider the following subset of R?: 


{(v1, v2) | vi E R and v2 > 0} Cc R? 


consisting of pairs of real numbers where the second real number in the pair is non-negative. 
In terms of displacements, it corresponds to the upper half-plane. You can check that the 
first two axioms V1 and V2 are satisfied, and that the zero vector (0,0) belongs to this 
subset. However —(v1, v2) = (—v1, —v2) whence if v2 is non-negative, —v2 cannot be 
non-negative unless v2 = 0. Therefore V4 is not satisfied. In fact, neither are V5, V7 and 
V8 unless we restrict the scalars to be non-negative real numbers. A more challenging 
exercise is to determine whether V6 is really necessary. 


The zero vector 0 of axiom V3 is unique. To see this notice that if there were another 0’ 
which also satisfies V3, then 


0’ =0+0' (by V3 for 0) 
=0 +0 (by V2) 
=0. (by V3 for 0’) 


Similarly the vector —v in V4 is also unique. In fact, suppose that there are two vectors 
u and ug which satisfy: v + u1 = 0 and v + u2 = 0. Then they are equal: 


ui = 0 + u (by V3 
= (v + u2) + u1 (by hypothesis 
= v + (u2 + u1) (by V1 
= v + (ui + u2) (by V2 
= (v + w1) + u2 (by V1 
=0+u2 (by hypothesis 
=u. (by V3 


A final word on notation: although we have defined a real vector space 
as two sets, vectors V and real scalars R, and two operations satisfying some 
axioms, one often simply says that ‘V is a real vector space’ leaving the other 
bits in the definition implicit. Similarly in what follows, and unless otherwise 
stated, we will implicitly assume that the scalars are real, so that whenever 
we say ‘V is a vector space’ we shall mean that V is a real vector space. 


1.1.4 Vector subspaces 


A related notion to a vector space is that of a vector subspace. Suppose that 
Y is a vector space and let W C Y be a subset. This means that W consists of 
some (but not necessarily all) of the vectors in V. Since V is a vector space, 
we know that we can add vectors in W and multiply them by scalars, but 
does that make W into a vector space in its own right? As we saw above 
with the example of the upper half-plane, not every subset W will itself be a 
vector space. For this to be the case we have to make sure that the following 
two axioms are satisfied: 


S1 If v and w are vectors in W, then so is v + w; and 
S2 For any scalar A € R, if w is any vector in W, then so is À w. 


If these two properties are satisfied we say that W is a vector subspace of 
Y. One also often sees the phrases ‘W is a subspace of V’ and ‘W is a linear 
subspace of V.’ 

Let us make sure we understand what these two properties mean. For v 
and w in W, v + w belongs to Y because Y is a vector space. The question 
is whether v + w belongs to W, and S1 says that it does. Similarly, if w € W 
is a vector in W and A € R is any scalar, then A w belongs to Y because Y is 
a vector space. The question is whether \ w also belongs to W, and S2 says 
that it does. 

You may ask whether we should not also require that the zero vector 0 
also belongs to W. In fact this is guaranteed by S2, because for any w € W, 
0 = Ow (why?) which belongs to W by S2. From this point of view, it is $2 
that fails in the example of the upper half-plane, since scalar multiplication 
by a negative scalar A < 0 takes vectors in the upper half-plane to vectors in 
the lower half-plane. 

Let us see a couple of examples. Consider the set R? of ordered triples of 
real numbers: 


R? = { (V1, V2, V3) | vi E R for i = 1,2,3} , 


and consider the following subsets 
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e Wi = {(v1, v2, 0) | uv; E€ R for i = 1,2} C RÌ, 
° Wo 4 {(v1, V2, U3) | v; E R for i = 1,2,3 and U3 > O} € R°, and 
e W3 = { (v1, v2, 1) | v; € R for i = 1,2} C R8. 


I will leave it to you as an exercise to show that Wı obeys both S1 and S2 
whence it is a vector subspace of R3, whereas W2 does not obey $2, and W3 
does not obey either one. Can you think of a subset of R? which obeys S2 
but not $1? 


1.1.5 Linear independence 


In this section we will introduce the concepts of linear independence and basis 
for a vector space; but before doing so we must introduce some preliminary 
notation. 

Let V be a vector space, V1, V2, ..., UN nonzero vectors in Y, and Ay, A2, 
..., An scalars, i.e., real numbers. Then the vector in V given by 


N 
So Xi = À V1 HAV + HANON , 


i=1 


is called a linear combination of the {v;}. The set W of all possible linear 


combinations of the {v1, v2, .. . , Vy } is actually a vector subspace of V, called 
the linear span of the {v1, v2,..., UN} or the vector subspace spanned by 
the {v1, V2,..., UN}. 


Recall that in order to show that a subset of a vector space is a vector subspace it is neces- 
sary and sufficient to show that it is closed under vector addition and under scalar multipli- 
cation. Let us check this for the subset W of all linear combinations of the {v1, v2,..., UN}. 
Let wı = M Qi vi and w2 = Dri Bi vi be any two elements of W. Then 


N N 
wi + w=) ai vit) Bi vi 
7=1 i=l 


Mz iM= 


ll 
nay 


(a; vi + bi vi) (by V2) 


i 


which is clearly in W, being again a linear combination of the {v1, v2,..., vN}. Also, if A 
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N 5 i 
is any real number and w = )>j_, a; v; is any vector in W, 


N 
Aw=À X aivi 
=L 


N 
i=l 
N 
=J Aai) vi, (by V5) 
i=l 
which is again in W. 
A set {v1, V2,..., Uy} of nonzero vectors is said to be linearly indepen- 


dent if the equation 


N 
NOA Ui = (0) 
i=1 


has only the trivial solution A; = 0 for all ¿ = 1,2,...,N. Otherwise the 
{v;} are said to be linearly dependent. 

It is easy to see that if a set {v1, v2,...,unN} of nonzero vectors is linearly 
dependent, then one of the vectors, say, v;, can be written as a linear combi- 
nation of the remaining N—1 vectors. Indeed, suppose that {v1,v2,...,un} 
is linearly dependent. This means that the equation 


N 
i=l 


must have a nontrivial solution where at least one of the {A;} is different 
from zero. Suppose, for definiteness, that it is A413. Because A; 4 0, we can 
divide equation (L.1) by A; to obtain: 


Ài 
vı + 2 A v;=0, 
i=2 
whence 
v As v As v AN 
—— ey <= —=. = a M v 
1 M 2 w 3 A N 

In other words, vı is a linear combination of the {v2,...,vy}. In gen- 
eral and in the same way, if A; # 0 then v; is a linear combination of 
{v1, toa y Uria Vikis: UN}. 


Let us try to understand these definitions by working through some ex- 
amples. 
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We start, as usual, with displacements in the plane. Every nonzero dis- 
placement defines a line through the origin. We say that two displacements 
are collinear if they define the same line. In other words, u and v are collinear 
if and only if u = Av for some A € R. Clearly, any two displacements in 
the plane are linearly independent provided they are not collinear, as in the 
figure. 

Now consider R? and let (u1, u2) and (v,v2) be two 
nonzero vectors. When will they be linearly independent? 
v From the definition, this will happen provided that the 
equation 
Ài (ui, u2) + ro (v1, V2) = (0, 0) 


has no other solutions but A; = Az = 0. This is a system of linear homoge- 
neous equations for the {Aj}: 


uy Ay + v1 Az = 0 


uz Ay FUA = 0. 


What must happen for this system to have a nontrivial solution? It will turn 
out that the answer is that uyv2 = u2v,. We can see this as follows. Multiply 
the top equation by uz and the bottom equation by uı and subtract to get 


(u1v2 = U2U1) ro =) 5 


whence either uyv2 = uv or Ag = 0. Now multiply the top equation by v2 
and the bottom equation by vı and subtract to get 


(u12 = U2U1) At = 0 3 


whence either uiv = usv or A; = 0. Since a nontrivial solution must have 
at least one of A, or À> nonzero, we are forced to have uU = U01. 


1.1.6 Bases 


Let V be a vector space. A set {€1, €2,...} of nonzero vectors is said to be a 
basis for V if the following two axioms are satisfied: 


B1 The vectors {e), €2,...} are linearly independent; and 


B2 The linear span of the {bej, e2,...} is all of V; in other words, any v in 
Y can be written as a linear combination of the {e1, e2,...}. 
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The vectors e; in a basis are known as the basis elements. 

There are two basic facts about bases which we mention without proof. 
First of all, every vector space has a basis, and in fact, unless it is the trivial 
vector space consisting only of 0, it has infinitely many bases. However not 
every vector space has a finite basis; that is, a basis with a finite number 
of elements. If a vector space does possess a finite basis {e),€2,...,en} 
then it is said to be finite-dimensional. Otherwise it is said to be infinite- 
dimensional. We will deal mostly with finite-dimensional vector spaces in 
this part of the course, although we will have the chance of meeting some 
infinite-dimensional vector spaces later on. 

The second basic fact is that if {e1, e2,...,ew} and {f1, fo,..., fum} are 
two bases for a vector space Y, then M = N. In other words, every basis 
has the same number of elements, which is therefore an intrinsic property 
of the vector space in question. This number is called the dimension of the 
vector space. One says that V has dimension N or that it is N-dimensional. 
In symbols, one writes this as dimV = N. 

From what we have said before, any two displacements which are non- 
collinear provide a basis for the displacements on the plane. Therefore this 
vector space is two-dimensional. 

Similarly, any (v1, v2) in R? can be written as a linear combination of 
{(1,0), (0, 1)}: 

(v1, V2) = U1 (1,0) + v2 (0, 1) k 


Therefore since {(1,0), (0, 1)} are linearly independent, they form a basis for 
R?. This shows that R? is also two-dimensional. 
More generally for RY, the set given by the N vectors 


FL AO: oss yO torent) son OO ess.) | 


is a basis for R7, called the canonical basis. This shows that RY has dimen- 
sion N. 

Let {v1, v2,...,U,} be a set of p linearly independent vectors in a vector 
space V of dimension N > p. Then they are a basis for the vector subspace 
W of V which they span. If p = N they span the full space V, whence they 
are a basis for V. It is another basic fact that any set of linearly independent 
vectors can be completed to a basis. 

One final remark: the property B2 satisfied by a basis guarantees that 
any vector v can be written as a linear combination of the basis elements, 
but does not say whether this can be done in more than one way. In fact, 
the linear combination turns out to be unique. 


Let us prove this. For simplicity, let us work with a finite-dimensional vector space Y 
with a basis {e1,e2,...,ey}. Suppose that a vector v € V can be written as a linear 
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combination of the {e;} in two ways: 


N N 
v= > vie; and v= > vlei. 
i=1 i=1 


We will show that v; = v, for all 7. To see this consider 


(=) 
II 
e 
| 
S 


ll 
a 
> 
ll 
an 


Il Il 
Mz iM=z 
- 
e | 
Bays 
g 


ll 
ah 


But because of B1, the {e;} are linearly independent, and by definition this means that 
the last of the above equations admits only the trivial solution v; — v; = 0 for all i. The 
numbers {v;} are called the components of v relative to the basis {e;}. 


Bases can be extremely useful in calculations with vector spaces. A clever 
choice of basis can help tremendously towards the solution of a problem, just 
like a bad choice of basis can make the problem seem very complicated. We 
will see more of them later, but first we need to introduce the second main 
concept of linear algebra, that of a linear map. 


1.2 Linear maps 


In the previous section we have learned about vector spaces by studying 
objects (subspaces, bases,...) living in a fixed vector space. In this section 
we will look at objects which relate different vector spaces. These objects 
are called linear maps. 


1.2.1 Linear maps 


Let Y and W be two vector spaces, and consider a map A : Y — W assigning 
to each vector v in V a unique vector A(v) in W. We say that A is a linear 
map (or a homomorphism) if it satisfies the following two properties: 


L1 For all vı and və in Y, A(vı + v2) = A(vı) + A(v2); and 
L2 For all v in Y and A ER, A(Av) = AA(v). 


In other words, a linear map is compatible with the operations of vector 
addition and scalar multiplication which define the vector space; that is, it 
does not matter whether we apply the map A before or after performing 
these operations: we will get the same result. One says that ‘linear maps 
respect addition and scalar multiplication.’ 
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Any linear map A: Y — W sends the zero vector in V to the zero vector 
in W. Let us see this. (We will use the notation 0 both for the zero vector in 
Y and for the zero vector in W as it should be clear from the context which 
one we mean.) Let v be any vector in V and let us apply A to 0+ v: 


A(0 + v) = A(0) + A(v) ; (by L1) 


but because 0+ v = v, 


A(v) = A(0) + A(v) , 


which says that A(0) = 0, since the zero vector is unique. 


ee 


Any linear map A: Y > W gives rise to a vector subspace of Y, known as the kernel of A, 
and written ker A. It is defined as the subspace of V consisting of those vectors in V which 
get mapped to the zero vector of W. In other words, 


ker A := {v E€ Y | Av) =OEW}. 
To check that ker A C W is really a vector subspace, we have to make sure that axioms S1 


and S2 are satisfied. Suppose that vı and v2 belong to ker A. Let us show that so does 
their sum vı + v2: 


A(vı + v2) = A(vı) + A(v2) (by L1) 
=0+0 (because A(v;) = 0) 
=0, (by V3 for W) 


vi +v2€kerA. 


This shows that S1 is satisfied. Similarly, if v € ker A and  € R is any scalar, then 


A(A v) = AA(v) (by L2) 
=A0 (because A(v) = 0) 
=0, (follows from V7 for W) 

Av EkerA ; 


whence S2 is also satisfied. Notice that we used both properties L1 and L2 of a linear map. 


There is also a vector subspace, this time of W, associated with A : Y — W. It is called 
the image of A, and written im A. It consists of those vectors in W which can be written 
as A(v) for some v € V. In other words, 


im A := {w E€ W | w = A(v) for some v E Y}. 
To check that im A C W is a vector subspace we must check that S1 and S2 are satisfied. 


Let us do this. Suppose that wı and w2 belong to the image of A. This means that there 
are vectors vı and v2 in Y which obey A(v;) = w; for i = 1,2. Therefore, 


A(vı + v2) = A(vı) + A(v2) (by L1) 


wit we , 


whence w1 + w2 belong to the image of A. Similarly, if w = A(v) belongs to the image 
of A and à € R is any scalar, 


A(\v) =  A(v) (by L2) 


=Aw, 
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whence A w also belongs to the image of A. 


As an example, consider the linear transformation A : R? — R? defined by (x,y) > 
(a — y,y — x). Its kernel and image are pictured below: 


ker A 


A linear map A : Y — W is said to be one-to-one (or injective or a monomorphism) if 
ker A = 0. The reason for the name is the following. Suppose that A(v1) = A(v2). Then 
because of linearity, A(v1 — v2) = 0, whence vı — v2 belongs to the kernel. Since the 
kernel is zero, we have that vı = v2. 


Similarly a linear map A : V — W is said to be onto (or surjective or an epimorphism) 
if im A = W, so that every vector of W is the image under A of some vector in Y. If 
this vector is unique, so that A is also one-to-one, we say that A is an isomorphism. If 
A : Y — W is an isomorphism, one says that Y is isomorphic to W, and we write this as 
y S W. As we will see below, ‘being isomorphic to’ is an equivalence relation. 


Notice that if V is an N-dimensional real vector space, any choice of basis {e;} induces an 
isomorphism A : Y — RN, defined by sending the vector v = SN vi ei to the ordered 
N-tuple made out from its components (v1, v2,..., Un) relative to the basis. Therefore we 
see that all N-dimensional vector spaces are isomorphic to RN, and hence to each other. 


An important property of linear maps is that once we know how they act 
on a basis, we know how they act on any vector in the vector space. Indeed, 
suppose that {e),é€2,...,e,} is a basis for an N-dimensional vector space 
Y. Any vector v € Y can be written uniquely as a linear combination of the 
basis elements: 


Let A: Y — W be a linear map. Then 


N 
A(v) =A y Uj €i 
i=1 
N 
=> A(vjei) (by L1) 
i=1 
N 
=) vi A(ei) . (by L2) 
i=1 
Therefore if we know A(e;) for i = 1,2,..., N we know A on any vector. 


© © The dual space. 
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1.2.2 Composition of linear maps 


Linear maps can be composed to produce new linear maps. Let A : Y — W 
and B : U — Y be linear maps connecting three vectors spaces U, V and W. 
We can define a third map C : U — W by composing the two maps: 


Uyw. 


In other words, if u € U is any vector, then the action of C on it is defined 
by first applying B to get B(u) and then applying A to the result to obtain 
A(B(u)). The resulting map is written Ao B, so that one has the composition 
rule: 


(A o B)(u) := A(B(u)) . (1.2) 


This new map is linear because B and A are, as we now show. It respects 
addition: 


(Ao B)(u, + u2) = A (B(u; + u2)) 
= A(B(u1) + B(u2)) (by L1 for B) 
= A(B(u,)) + A (B(u2)) (by L1 for A) 


= (Ao B)(u1) + (A o B) (u2) ; 


and it also respects scalar multiplication: 


(Ao B\(Au) = A(B(A4u)) 
= A (A B(u)) (by L2 for B) 
= \ A(B(u)) (by L2 for A) 

( 


Thus Ao B is a linear map, known as the composition of A and B. One 
usually reads Ao B as ‘B composed with A’ (notice the order!) or ‘A pre- 
composed with B.’ 


Notice that if A and B are isomorphisms, then so is Ao B. In other words, composition 
of isomorphisms is an isomorphism. This means that if U = Y and Y & W, then U S W, 


so that the property of being isomorphic is transitive. This property is also symmetric: 
if A : Y > W is an isomorphism, AT! : W — Y is too, so that Y & W implies W & Y. 


Moreover it is also reflexive, the identity map 1 : Y — Y provides an isomorphism Y & Y. 
Hence the property of being isomorphic is an equivalence relation. 


1.2.3 Linear transformations 


An important special case of linear maps are those which map a vector space 
to itself; A : Y — V. These linear maps are called linear transformations 
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(or endomorphisms). Linear transformations are very easy to visualise in 
two dimensions: 
I . ER l 


A linear transformation sends the origin to the origin, straight lines to 
straight lines, and parallelograms to parallelograms. 

Composition of two linear transformation is another linear transforma- 
tion. In other words, we can think of composition of linear transformations 
as some sort of multiplication. This multiplication obeys a property remi- 
niscent of the associativity V1 of vector addition. Namely, given three linear 
transformations A, B and C, then 


(AoB)oC=Ao(BoC). (1.3) 


To see this simply apply both sides of the equation to v € V and use equation 
to obtain in both cases simply A(B(C(v))). By analogy, we say that 
composition of linear transformations is associative. Unlike vector addition, 
composition is not commutative; that is, in general, Ao B # B o A. 

Let 1: Y — V denote the identity transformation, defined by 1(v) = v 
for all v € V. Clearly, 


1oA=Ao1l=A, (1.4) 
for any linear transformations A. In other words, 1 is an identity for the 
composition of linear transformations. Given a linear transformation A : 
Y — Y, it may happen that there is a linear transformation B : Y —> Y such 
that 

BoA=AoB=1. (1.5) 
If this is the case, we say that A is invertible, and we call B its inverse. We 
then write B = A7?. 

The composition of two invertible linear transformations is again invert- 
ible. Indeed one has 
(Ao B) t= BoA. 


© To show this we compute 
Bto A7! 0(AoB)=B-'o0 A`to(Ao B) (by equation (1.3)) 
=B-to A-toA oB (by equation (1.3)) 
= B7! o (10 B) (by equation (.5)) 
=B10B (by equation (1.4) 
=ils (by equation (@5)) 


ee 


and similarly 


(Ao B)o(B710 A7!)=Ao Bo B10 A`! (by equation 
=Ao BoB! oA7}) (by equation 

=Ao 10A7) (by equation (1.5) 
= Ao A`! (by equation 
( 


) 
=1. ) 


) 
) 
) 
) 
by equation (Z.5)) 


This shows that the invertible transformations of a vector space Y form a group, called 
the general linear group of Y and written GL(V). 


A group is a set G whose elements are called group elements, together with an operation 
called group multiplication and written simply as 


group multiplication: G x G — G 
(x,y) = zy 


satisfying the following three axioms: 


G1 group multiplication is associative: 


(xy)z = z(yz) for all group elements x, y and z. 


G2 there exists an identity element e € G such that 


ex =xre =x forall group elements x. 


G3 every group element x has an inverse, denoted x71 and obeying 


If in addition the group obeys a fourth axiom 


G4 group multiplication is commutative: 


zy = yz for all group elements x and y, 


then we say that the group is commutative or abelian, in honour of the Norwegian math- 
ematician Niels Henrik Abel (1802-1829). 


When the group is abelian, the group multiplication is usually written as a group addition: 
x+y instead of xy. Notice that axioms V1—V4 for a vector space say that, under vector 
addition, a vector space is an abelian group. 


Groups are extremely important objects in both mathematics and physics. It is an ‘al- 
gebraic’ concept, yet its uses transcend algebra; for example, it was using the theory 
of groups that quarks were originally postulated in particle physics. The fact that we 
now think of quarks as elementary particles and not simply as mathematical construct is 
proof of how far group theory has become a part of our description of nature at its most 
fundamental. 
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1.2.4 The vector space of linear maps 


Now we point out that linear maps themselves also form a vector space! In 
order to do this, we have to produce the two operations: vector addition and 
scalar multiplication, and show that they satisfy the eight axioms V1—V8. 
Let A and B be linear maps Y —> W, let A € R be a scalar, and let v € V be 
any vector. Then we define the two operations by 


(addition) 
(A+ B)(v) = A(v) + Bw), (1.6) 


(scalar multiplication) 


(AA)(v) =AA(v) . (1.7) 


Having defined these two operations we must check that the axioms are 
satisfied. We leave this as an exercise, except to note that the zero vector 
is the transformation which sends every v € V to 0 € W. The rest of the 
axioms follow from the fact that W is a vector space. 


This is a general mathematical fact: the space of functions f : X — Y always inherits 
whatever algebraic structures Y possesses simply by defining the operations pointwise in 


Let £(V,W) denote the vector space of linear maps Y — W. What is its 
dimension? We will see in the next section when we talk about matrices that 
its dimension is given by the product of the dimensions of V and W: 


dim L(V,W) = dim V dim W . (1.8) 


In particular the space £(V,Y) of linear transformations of V has dimension 
(dim V)?. We will call this space L(V) from now on. 

Because £(Y) is a vector space, its elements can be added and as we saw 
above, composition allows us to multiply them too. It turns out that these 
two operations are compatible: 


Ao(B+C)=(AoB)+(AoC) (1.9) 
(A+ B)oC=(AoC)+(BoC). (1.10) 


Let us prove the left and right distributivity properties. Let A, B, and C be linear 
transformations of a vector space Y and let v € Y be an arbitrary vector. Then 


(A0 (B+ 0)) (2) = A (B+ C)(™) (by equation (3) 


A( 
(Bw) + C(v)) (by equation 

(B(w)) + A(C(w)) (because A is linear 
= (Ais B) (v) + (Ao C)(v) , (by equation 


Na Ne Ne 
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which proves (1.9). Similarly 


((A +.B) 0C)(v) = (A+ B)(C(w)) (by equation (£3) 
= (A(C(v)) + B(C(w)) (by equation (TB) 
= (AoC)(v) + (BoC)(v) , (by equation (1.2)) 


which proves i 


Composition of linear transformations is also compatible with scalar mul- 


tiplication: 


ee 


ee 


(AA)oB=Ao(AB)=X(ACB). (1.11) 


In fact, we can summarise the properties (1.9), (£10) and (LII) in a very simple way 
using concepts we have already introduced. Given a linear transformation A of V we will 
define two operations on £(V), left and right multiplication by A, as follows: 


La: (Y) > L(Y) and Ra: L(Y) > L(V) 
B= AoB B= BoA. 


Then equations (1.9), @.10) and (LI) simply say that LĄ and Ra are linear transfor- 
mations of L(Y)! 


The vector space L(Y) of linear transformations of Y together with the operation of com- 
position, the identity 1, the distributive properties and (1.10), and the condition 
(1.11) is an associative algebra with identity. 


An algebra is a vector space A together with a multiplication 


multiplication: Ax A — A 
(A,B)= AB, 


obeying the following axioms, where A,B,C € A and A €R: 


Al (left distributivity) A(B+C)=AB+AC; 
A2 (right distributivity) (A+ B)C = AC + BC; 
A3 A(àB)=(àA)B=A\A(AB). 


If in addition A obeys the axiom 

A4 (identity) There exists 1 € A such that 1A = A1 = A; 
then it is an algebra with identity. If instead A obeys the axiom 
A5 (associativity) A (B C) = (A B) C; 


it is an associative algebra. Finally if it obeys all five axioms, it is an associative algebra 
with identity. 


It is a general fact that the invertible elements of an associative algebra with identity form 
a group. 
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1.2.5 Matrices 


Matrices are intimately linked to linear maps. Let A : Y — W be a linear 
map between two finite-dimensional vector spaces. Let {e1, €2,..., en} be 
a basis for V and let {f}, fo,---, fm} be a basis for W. Let us write each 
A(e;) as a linear combination of the basis elements { f ;}: 


M 
A(ei) = X Aji Fis (1.12) 
j=l 
where have introduced a real number A,; for each ¿i = 1,2,...,M and j = 


1,2,...,M, a total of N M real numbers. Now let v be a vector in Y and 
consider its image w = A(v) under A. We can expand both v and w as 
linear combinations of the respective bases: 


N M 
v=) viei and w=) wf; (1.13) 
i=1 j=1 
Let us now express the w; in terms of the v;: 
w = A(v) 
N 
=A (>: vi e) (by the first equation in (LI3)) 
i=1 
N 
=X A(ve;) (by L1) 
i=1 
N 
=X v; A (ei) (by L2) 
i=1 
N M 
= t? Vi ` Aj f; (by equation (L.12)) 
i=1 j= 
M /N 
= (>: Ag n) f, (rearranging the sums) 
j=1 \i=l 


whence comparing with the second equation in (LI3) we obtain the desired 
result: 


N 
wj =>) Ar; - (1.14) 
i=1 
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To visualise this equation, let us arrange the components {v;} and {w;} of v 
and w as ‘column vectors’ v and w, and the real numbers A;; as an M x N 
matrix A. Then equation (1.14) can be written as 


w=Av, 
or explicitly as 
Wi Ay Ai © Ain U1 
W2 _ Aoi Ag = Aon v2 
WM Ami Amo ` Amn UN 
Therefore the matrix 
Ay Ai Ain 
A A Aon 
A= 21 2 2 
Amı Am2 © Amn 


represents the linear map A : V — W relative to the bases {e;} and {f;} of V 
and W. It is important to stress that the linear map A is more fundamental 
than the matrix A. If we choose different basis, the matrix for the linear map 
will change (we will see this in detail below), but the map itself does not. 
However if we fix bases for V and W, then there is a one-to-one correspondence 
between linear maps Y — W and M x N matrices. 


The commuting square: linear maps to matrices 
a saw in Section [1.2.4] that the space £(V,W) of linear maps Y —> W is 
a vector space in its own right. How are the operations of vector addition 
and scalar multiplication defined for the matrices? It turns out that they 
are defined entry-wise as for real numbers. Let us see this. The matrix 
corresponding to the sum of two linear maps A and A’ is given by 
M 
(A+ A)(e) = (AFA) sf; 
j=l 
On the other hand, from equation (1.6) we have that 


(A+ A’)(e;) = A(e:) + ae 


M 

= DAs jt 3 a 
7 

= > (Ai +AT 
j=1 


Therefore we see that the matrix of the sum is the sum of the matrices: 


fos. 


(A+ A’) ji = Ag+ Ajj ; 


or in other words, the sum of two matrices is performed entry-by-entry: 


Ait Ap anata Ain Ai Ais scien Ain 
Agr A22 ARR Aon Abi Ab Poe Abn 
Amı Ame Aun Am Am © Aun 
Ay Ai Ap T A'a Ain T AIN 
-= A1 F Ay Ags T A32 ee Aon = Ange 
Ami + Ain Am2 + A'mo ss) Ann + AMN 


Similarly, scalar multiplication is also performed entry-by-entry. If A € R is 
a scalar and A is a linear map, then on the one hand we have 


M 


(à A) (e:i) = 5a A) 5 Ji 


j=1 


but from equation we have that 


M 
=à Aif; 
j=l 


M 
=X AAG; 5 
j=l 


so that the matrix of AA is obtained from the matrix of A by multiplying 


each entry by A: 
(A A) 5 i r Aji ; 


explicitly, 
An Aig © Aw AAy AA > AA 
N Azn Áz +++ Aon B AAa AA + AAN 
Amı Amz ` Amn AAmı Am2 > AAuN 
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The vector space of M x N matrices has a ‘canonical’ basis given by the 
matrices E;; all of whose entries are zero except for the entry sitting in the 
intersection of the jth column and the ith row, which is 1. They are clearly 
linearly independent and if A is any matrix with entries Aj; then 


N M 
A=) Y 0 AjE;i 
i=1 j=1 


so that their span is the space of all M x N matrices. Therefore they form 
a basis for this space. The matrices E;; are known as elementary matrices. 
Clearly there are M N such matrices, whence the dimension of the space of 
M x N matrices, and hence of £(V,W), is M N as claimed in equation (L8). 

Now consider a third vector space U of dimension P and with basis 
{91,92,---,gp}. Then a linear map B : U — Y will be represented by 
an N x P matrix 


By By es Bip 
B= Bar Baz E Bap 
Bn Bno e Byp 


relative to the chosen bases for U and Y; that is, 


B(g,) = $ Bue; - (1.15) 


The composition Ao B : U — W will now be represented by an M x P matrix 
whose entries Cj, are given by 


(40 B)(g,) = X- Ca fy. (1.16) 


The matrix of Ao B can be expressed in terms of the matrices A and B. To 
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see this, let us compute 


(Ao B)(g;,) = A(B(gx)) 


N 
=A 2 Big e: 
i=1 


N 
i=] 


N M 
= ` Bix ` Aji fj 
i=l j=l 


(by equation (7.2)) 


(by equation (£15) ) 


(since A is linear) 


(by equation (L.12)) 


(rearranging sums) 


M N 
5 (>: a Ba) fy. 
j=l \i=l 


Therefore comparing with equation (1.16) we see that 


N 
Cik = X Aji Bir » 
i=1 


which is nothing else but matrix multiplication: 


Cı 1 Ch 2 
Ca Co 
Cu 1 Cu 2 


In other words, 


the matrix of A o B is the matrix product AB. 


Cip 
Cop 
Cup 
An Ai © Ain By By Bip 
Az, Áz > Aon Ba Bz Bop 
Amı Am2 > Amn) \Bm Bro Byp 


(1.17) 


Let us consider now linear transformations £(V) of an N-dimensional vec- 


tor space V with basis {e1, €,.. 


. en}. Matrices representing linear trans- 


formations V — V are now a square N x N matrices. We can add them 
and multiply them as we do real numbers, except that multiplication is not 
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commutative: for two matrices A and B one has that, in general, AB 4 BA. 
Let A be an N x N matrix. If there exists another matrix B which obeys 


AB=BA=lI 


where | is the identity matrix, then we say that A is invertible. Its inverse 
B is written A~t. A matrix which is not invertible is called singular. Clearly 
a matrix is singular if and only if its determinant is zero. 

A useful fact is that a matrix is invertible if and only if its determinant 
is different from zero. This allows us to show that the product of invertible 
elements is again invertible. To see this notice that the determinant of a 
product is the product of the determinants: 


det(A B) = det A det B , (1.18) 


and that this is not zero because neither are det A nor det B. In fact, the 
inverse of a product AB is given by 


=) E = 
(AB) =B A~! . (1.19) 
(Notice the order!) 
Matrices, just like £(V), form an associative algebra with identity. The algebra of N x N 
real matrices is denoted Maty (R). The invertible elements form a group, which is denoted 


GL» (R), the general linear group of RY. 


1.2.6 Change of basis 


We mentioned above that a linear map is more fundamental than the matrix 
representing it relative to a chosen basis, for the matrix changes when we 
change the basis but the linear map remains unchanged. In this Section 
we will explore how the matrix of a linear map changes as we change the 
basis. We will restrict ourselves to linear transformations, but the results 
here extend straightforwardly to linear maps between different vector spaces. 

Let V be an N-dimensional vector space with basis {e;}, and let A : V — Y 
be a linear transformation with matrix A relative to this basis. Let {e{} be 
another basis. We want to know what the matrix A’ representing A relative 
this new basis is. By definition, the matrix A’ has entries Aj; given by 


Ale) = So Ape. (1.20) 


Because {e;} is a basis, we can express each element e, of the primed basis 
in terms of them: 


N 
e. = bD Sji ej, (1.21) 
j=1 


for some N? numbers S;;. We have written this equation in such a way 
that it looks as if Sj; are the entries of a matrix. This is with good reason. 
Let S : Y — YV be the linear transformation defined by S(e;) = e; for 
i =1,2,...,N. Then using the explicit expression for e; we see that 


N 
S(e;) = > Sji ej 7 
j=l 


so that Sj; are indeed the entries of a matrix S relative to the basis {e;}. 
We can compute both sides of equation (1.20) separately and compare. The 
right-hand side gives 


Ale) =A (> Sji e) (by equation (L.21)) 


j=1 
N 

= ` Sji A(e;) (since A is linear) 
j=1 
N N 

= Ss Sji ` Akj €k (by equation (L.12)) 
j=l k=1 
N N 

= ` > AkjSji Ek - (rearranging sums) 
k=1 j=1 


On the other hand, the left-hand side gives 


N N N 
> Ae = ` Aji bP Skj €k (by equation (L21) 
j=l j=l k=l 
N N 
= > ` Spi Ais €k - (rearranging sums) 
k=1 j=1 


Comparing the two sides, we see that 


or in terms of matrices, 


AS=SA\. (1.22) 


Now, S is invertible. To see this use the fact that because {e;} is also a 
basis, we can write each e; in terms of the {e/}: 


N 
j=l 


By the same argument as above, the N? numbers T;; are the entries of a ma- 
trix which, relative to the primed basis, represents the linear transformation 
T : Y — Y defined by T(e/) = e;. The linear transformations S and T are 
mutual inverses: 


S(T(e,)) = S(e;) =e, and T(S(e))=Tle)—€;. 


a 


so that T o S = S o T = 1; or in other words, T = S71. 
Therefore, we can multiply both sides of equation (1.22) by S~! on the 


left to obtain 
aS AS. (1.24) 


The operation above taking A to A’ is called conjugation by S. One says 
that the matrices A and A’ are conjugate. (This is not be confused with the 
notion of complex conjugation.) 


© Change of basis for linear maps. 


1.2.7 Matrix invariants 


Certain properties of square matrices do not change when we change the 
basis; one says that they are invariants of the matrix or, more precisely, of 
the linear map that the matrix represents. 

For example, the determinant is one such invariant. This can be seen 
by computing the determinant to both sides of equation and using 
equation (1.18), to obtain that det A’ = det A. This implies that also the 
property of being invertible is invariant. 

Another invariant is the trace of a matrix, defined as the sum of the 
diagonal elements, and written tr A. Explicitly, if A is given by 


Ay Aj as Ain 
A= Aa Az _ Aan 
Ami Ano ay Ann 
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then its trace trA is given by 


N 


i=l 


A matrix whose trace vanishes is said to be traceless. 

The fact that the trace is indeed an invariant, will follow from some 
fundamental properties of the trace, which we discuss now. The trace satisfies 
the following property: 


tr (AB) = tr (BA) . (1.26) 


Let us prove this. Let A, B : Y — Y be linear maps with matrices A and B relative to some 
fixed basis. The matrix product AB is the matrix of the composition A o B. Computing 
the trace of the product, using equations and (1.25), we find 


i=1lg=1 
N N 
= > 5 Bji Aij (rearranging the sums) 
j=1i=1 
N N 
= 5 5 Bij Aji (relabelling the sums) 
i=1 j=1 
= tr (BA). 


The fact which allows us to relabel the summation indices is known as the Shakespeare 
Theorem: “a dummy index by any other name...” The modern version of this theorem is 
due to Gertrude Stein: “a dummy index is a dummy index is a dummy index.” 


It follows from equation (0.26) that 
tr (ABC) = tr(CAB) = tr(BCA) , (1.27) 


which is often called the cyclic property of the trace. Using this property and 
computing the trace to both sides of equation (1.24), we see that tr A’ = tr A, 
as claimed. Notice that the trace of the identity N x N matrix | is trl = N. 


Because the trace is an invariant, it actually defines a function on the vector space of linear 
maps L(Y). The trace of a linear map A : Y — Y is defined as the trace of any matrix of A 
relative to some basis. Invariance says that it does not depend on which basis we choose 
to compute it with respect to. As a function tr : £(V) — R, the trace is actually linear. It 
is an easy exercise to prove that 


tr(A + B) = trA + tr B and tr(AA) =AtrA. 
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There are other properties of a matrix which are not invariant under 
arbitrary change of basis; but are nevertheless important. For example, given 
a matrix A let its transpose, denoted A’, be the matrix whose (i,j) entry 
equals the (j,i) entry of A. Explicitly, 


Ai, Aj ae Ain Ay Ay aie Ani 
A= An Az n Aan a At n Ar Az k Anz 
Anı An? as Ann Ain Aon mk Ann 


In other words, A‘ is obtained from A by reflecting the matrix on the main 
diagonal, and because reflection is an involution, it follows that 


(A =A. (1.28) 


It follows from the expression for A‘ that the diagonal entries are not changed, 
and hence that 


tA SA (1.29) 
It is also easy to see that 
(AB) = Bt At (1.30) 
and also that 
(A+B) = At +B and (AAY = AAt . (1.31) 


From the former equation it follows that 


(AH = (At) . (1.32) 


which follows from the fact that the row expansion of the determinant of A’ is 
precisely the column expansion of the determinant of A. A matrix is said to 
be symmetric if At = A. It is said to be antisymmetric or skew-symmetric 
if At = —A. Notice that an antisymmetric matrix is traceless, since 


A less obvious identity is 


trA=trA’=tr(—A) =—trA. 
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The converse is of course false: a traceless matrix need not be antisymmetric. 

Generic matrices are neither symmetric nor antisymmetric, yet any ma- 
trix is the sum of a symmetric matrix and an antisymmetric matrix. Indeed, 
adding and subtracting 5AY in a clever way, we see that 


A=35(A+A‘) +35 (A-A’) . 


But now, using equations (1-28) and (1.31), we see that $(A+A‘) is symmetric 
and ¿(A — A‘) antisymmetric. 
A matrix O is said to be orthogonal if its transpose is its inverse: 


OO=00' =l. 


The property of being symmetric or antisymmetric is not invariant under arbitrary changes 
of basis, but it will be preserved under certain types of changes of basis, e.g., under 
orthogonal changes of basis. 


1.3 Inner products 


Vectors in physics are usually defined as objects which have both a mag- 
nitude and a direction. In that sense, they do not quite correspond to the 
mathematical notion of a vector as we have been discussing above. In our de- 
finition of an abstract vector space as in the discussion which followed, there 
is no mention of how to compute the magnitude of a vector. In this section 
we will remedy this situation. Geometrically the magnitude of a vector is 
simply its length. If we think of vectors as displacement, the magnitude is 
the distance away from the origin. In order to define distance we will need 
to introduce an inner product or scalar product, as it is often known. 


1.3.1 Norms and inner products 


Let us start by considering displacements in the plane. The length ||v|| of the 
displacement v = (v1, v2) is given by the Pythagorean theorem: ||v||? = v? + 
v3. This length obeys the following properties which are easily verified. First 
of all it is a non-negative quantity ||v||? > 0, vanishing precisely for the zero 
displacement 0 = (0,0). If we rescale v by a real number A: Av = (Av, A v2), 
its length rescales by the absolute value of A: ||Av]| = |A| ||v||. Finally, the 
length obeys the so-called triangle inequality: ||v + w]| < |]v|| + |/w||. This 
is obvious pictorially, since the shortest distance between two points in the 
plane is the straight line which joins them. In any case we will prove it later 
in much more generality. 
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Now consider RN. We can define a notion of length by generalising slightly 
what was done above: if (v1, v2,..., UN) E R, then define its length by 


Con vewn) = ok HoHo 


It again satisfies the same three properties described above. 

We can formalise this into the notion of a norm in a vector space. By a 
norm in a real vector space V we mean a function || - || : V — R assigning 
a real number to every vector in Y in such a way that the following three 
properties are satisfied for every vector v and w and every scalar A: 


N1 |le|| > 0, and |||] = 0 if and only if v = 0; 
N2 ||A || = JA] [lol]; and 
N3 (triangle inequality) ||v + wl] < ||v|| + Iwl]. 


The study of normed vector spaces is an important branch of modern 
mathematics (cf., one of the 1998 Fields Medals). In physics, however, it is 
fair to say that the more important notion is that of an inner product. If a 
norm allows us to calculate lengths, an inner product will allow us to also 
calculate angles. 

Consider again the case of displacements in two dimensions, or equiva- 
lent R?. Let us define now a function which assigns a real number to two 
displacements v = (v1, v2) and w = (wy, w2): 


(v, w) := v W1 + U2 We . 


This is usually called the dot product and is written v - w. We will not use 
this notation. 

Clearly, (v, v) = ||v||?, so that this construction also incorporates a norm. 
If we write the displacements using polar coordinates: v = ||v|| (cos 61, sin 61) 
and similarly for w = ||w|| (cos 02, sin 02), then we can compute: 


(v, w) = ||v|| |/ew|| cos (81 — 62) . (1.34) 


In other words, (-,-) is essentially the angle between the two displacements. 
More generally we can consider R and define its dot product as follows. If 
v = (U1, V2,...,UN) and w = (w1, We,...,wy), then 


N 
(v, w) := J Vi Wi = V1 W1 + U2 Wa +- HUNUWN . 


i=1 
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The dot product satisfies the following properties. First of all it is symmetric: 


(v,w) = (w,v). It is also linear in the right-hand slot: (v,w + w) = 
(v,w) + (v, w) and (v, à w) = à (w, w); and using the symmetry also in the 
left-hand slot. It is also important that the function ||v|| := ./(v,v) is a 


norm. The only non-obvious thing is to prove the triangle inequality for the 
norm, but we will do this below in all generality. The vector space R with 
the dot product defined above is called N-dimensional Euclidean space, 
and is denoted E^. As a vector space, of course, EY = R7, but E serves to 
remind us that we are talking about the vector space with the dot product. 
Notice that in terms of column vectors: 


V1 Wi 

v2 W2 
v= and w= : , 

UN WN 


the dot product is given by 


Wi 
N 
w2 
(v,w) = vw = (v Vg + vy) . DD 
à i=1 
WN 


More generally, we define an inner product (or scalar product) on a real 
vector space V to be a function (-,-) : V x V — R taking pairs of vectors to 
real numbers and obeying the following axioms: 


IP1 (v,w) = (w, v); 
IP2 (u, Av + uw) =A(u,v) + u (u, w); and 
IP3 ||v||? = (w, v) > 0 for all v £0. 


Notice that IP1 and IP2 together imply that (Aw + wv, w) = à (u, w) + 
p (v, w). 

Let {e;} be a basis for Y. Because of IP1 and IP2, it is enough to know 
what the inner product of any two basis elements is to be know what it is on 
any two vectors. Indeed, let v = ye vi e; and w = Ba wie; be any two 
vectors. Then their inner product is given by 


N N 
(v, w) = o> Vi €i, 5 vj ej} 
i=l j=l 


N 
= ` Vi W; (€i, €j) . (using IP1,2) 


i,j=1 
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In other words, all we need to know in order to compute this are the real 
numbers Gj; := (e;,e;). These can be thought of as the entries of a matrix G. 
If we think of v as a column vector v in RY whose entries are the components 
of v relative to the basis {e;}, and the same for w, we can compute their 
inner product using matrix multiplication: 


(v,w) =v Gw. 


The matrix G is not arbitrary. First of all from IP1 it follows that it is 

symmetric: 

Gij = (€i, €j) = (€j, €i) = Gji . 

Furthermore IP3 imposes a strong condition known as positive-definiteness. 
We will see at the end of this section what this means explicitly. Let us 
simply mention that IP3 implies that the only vector which is orthogonal 
to all vectors is the zero vector. This condition is weaker than IP3. It is 
often desirable to relax IP3 in terms of this condition. Such inner products 
are called non-degenerate. Non-degeneracy means that the matrix G is 
invertible, so that its determinant is non-zero. 

Here comes a point which confuses many people, so pay attention! Both 
inner products and linear transformations are represented by matrices rel- 
ative to a basis, but they are very different objects. In particular, they 
transform different under a change of basis and this means that even if the 
matrices for a linear transformation and an inner product agree numerically 
in a given basis, they will generically not agree with respect to a different 
basis. Let us see this in detail. Let {e;} be a new basis, with e; = S(e;) for 
some linear transformation S. Relative to {e;} the linear transformation S 
is represented by a matrix S with entries S}; given by equation (121). Let G’ 
denote the matrix describing the inner product in the new basis: its entries 
Gi, are given by 


Gij = (€;, e;) (by definition) 
N N 
= > Shi €k, bD Sij €1) (by equation (L21)) 
k=1 I=1 
N 
= ` Ski Sij (€r, €1) (using IP1,2) 
kl=1 
N 
= So Ski Gar Sy 
kl=1 
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In other words, 


Gæ = SGS, (1.35) 


to be contrasted with the analogous formula (1.24) for the behaviour of the 
matrix of a linear transformation under a change of basis. 


Notice, however, that under an orthogonal change of basis, so that S~! = St, then both 
inner products and linear maps transform the same way. 


1.3.2 The Cauchy—Schwartz and triangle inequalities 


In this section we prove that ||v|| = y (v, v) is indeed a norm. Because 
axioms N1 and N2 are obvious from the axioms of the inner product, all we 
really need to prove is the triangle inequality. This inequality will follow 
trivially from another inequality called the Cauchy—Schwartz inequality, and 
which is itself quite useful. Consider equation (1.34). Because the cosine 
function obeys | cos6| < 1, we can deduce an inequality from equation (1.34). 
Namely that for any two displacements v and w in the plane, 


(w, w)| < lell lwl], 


with equality if and only if the angle between the two displacements is zero; 
in other words, if the displacements are collinear. The above inequality 
is called the two-dimensional Cauchy—Schwartz inequality. This inequality 
actually holds in any vector space with an inner product (even if it is infinite- 
dimensional). 

Let v and w be any two vectors in a vector space Y with an inner product 
(-,-). Let A be a real number and let us consider the following inequality: 


0 < |v — à w|]? 


= (v — Aw, v — àw) (by definition) 
= jjv]? + IA w||? — 2(v, Aw) (expanding and using IP1,2) 
= |jvl|? + A? lwl]? — 2d (w, w) . (using IP2) 


Now we want to make a clever choice of A which allows us to partially cancel 
the last two terms against each other. This way we can hope to get an 
inequality involving only two terms. The clever choice of À turns out to be 
A = (v, w) /||w]|?. Inserting this into the above equation and rearranging the 
terms a little, we obtain the following inequality 


(v, w)” 
lw]? 
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lol? > 


Taking the (positive) square root and rearranging we arrive at the Cauchy— 
Schwartz inequality: 


|w, w)| < lloll lwl] . (1.36) 


The triangle inequality now follows easily. Let us expand ||v + w||? as 
follows: 


= ||v||? + |w? + 2v, w) (using IP1,2) 
< |lvl|? + lwll? + 2] (v, w)| (since x < |z|) 
< lloll? + lwll? + 2||v|| Iw || (using Cauchy-Schwartz) 
= (||v|] + wl)” 


Taking the (positive) square root we arrive at the triangle inequality: 


lv + wl] < lloll + lwl] - (1.37) 


1.3.3 Orthonormal bases and Gram-Schmidt 


Throughout this section we will let Y be an N-dimensional real vector space 
with an inner product (-,-). 

We say that two vectors v and w are orthogonal (written v L w) if their 
inner product vanishes: (v,w) = 0. Any nonzero vector can be normalised 
to have unit norm simply dividing by its norm: v/||v|| has unit norm. A 
basis {e;} is said to be orthonormal if 
1 ifi= j (1.38) 
0 otherwise. 


(ei, €;) = On = i 

In other words, the basis elements in an orthonormal basis are mutually 

orthogonal and are normalised to unit norm. Notice that the matrix rep- 

resenting the inner product relative to an orthonormal basis is the identity 
matrix. 

The components of a vector v relative to an orthonormal basis {e;} are 

very easy to compute. Let v = peasy viei, and take its inner product with 
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€j: 


(ej, v} = (€j, > V; €i) 


N 

E ss vi (ej, €i) (using IP2) 
i=1 

=y (using equation (0.38) ) 


This shows that orthonormal vectors are automatically linearly independent. 
Indeed, suppose that {e;} are orthonormal vectors. Then suppose that a 
linear combination is the zero vector: 


Ae=0. 
Dy 


Taking the inner product of both sides of this equality with e; we find, on 
the left-hand side A; and on the right-hand side 0, hence A; = 0 and thus the 
{e;} are linearly independent. 

We now discuss an algorithmic procedure by which any basis can be 
modified to yield an orthonormal basis. Let {f;} be any basis whatsoever 
for Y. We will define iteratively a new basis {e;} which will be orthonormal. 
The procedure starts as follows. We define 


-fi 
Fill’ 


which has unit norm by construction. We now define ez starting from fə 
but making it orthogonal to e; and normalising it to unit norm. A moment’s 
thought reveals that the correct definition is 


= fo —(Fo,€1) €1 
lfa — (fa €1) ell 


It has unit norm by construction, and it is clearly orthogonal to eı because 


(fo ~~ (f2, €1), €1) = (fa, €1) ~— (fa; €1) Ilex ||? =0. 


We can continue in this fashion and at each step define e; as f; +-+- di- 
vided by its norm, where the omitted terms are a linear combination of the 
{€1, €2,...,e;-1} defined in such a way that the e; is orthogonal to them. 
For a finite-dimensional vector space, this procedure stops in a finite time 


€i 


e2 
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and we are left with an orthonormal basis {e;}. The general formulae for the 
e; is 


i-1 

B fi T Djali ej) ej 

a i=l ; 
Ifi- Ži Je ej) ejl 

Notice that this formula is recursive: it defines e; in terms of f; and the 


{ej<i}- 


© Studying this formula we see that each e; is a linear combination 


e; 


(1.39) 


i 
ei = X Sanf; 3 (1.40) 
j=l 
where Sj; is positive, since it is given by Si = 1/||f; +--+- ||- Now let S be the linear 


transformation defined by S(f;) = ei. Relative to the original basis {f;}, S has a matrix 
S with entries Sj; defined by 


N 
ei = 5 Sji fj . 
j=l 


Comparing with equation (1.40) we see that Sji = 0 for j > i, so that all the entries of 
S below the main diagonal are zero. We say that S is upper triangular. The condition 
Sii > 0 says that the diagonal entries are positive. 


We can turn equation around and notice that f; is in turn given as a linear combi- 
nation of {e;<,;}. The linear transformation T defined by f; = T (e;), which is the inverse 
of S, has a matrix T relative to the {e;} basis which is also upper triangular with positive 
entries on the main diagonal. Now the matrix G with entries Gij = (fi, fi) representing 
the inner product on the {f;} basis, is now given by 


G=TT. 


In other words, since the { f;} were an arbitrary basis, G is an arbitrary matrix representing 
an inner product. We have learned then that this matrix can always be written as a 
“square” TtT, where T is an upper triangular matrix with positive entries in the main 
diagonal. 


1.3.4 The adjoint of a linear transformation 


Throughout this section we will let V be an N-dimensional real vector space 
with an inner product (-,-). 

Let A: Y — V be a linear transformation. A linear transformation is 
uniquely defined by its matrix elements (A(v), w). Indeed, if A’ is another 
linear transformation with (A’(v),w) = (A(v), w) for all v and w, then we 
claim that A = A’. To see this notice that 


0 = (Av), w) — (A(v), w) 
= (A'(v) — A(v), w) . (using IP1,2) 
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Since this is true for all w, it says that the vector A’(v) — A(v) is orthogonal 
to all vectors, and in particular to itself. Therefore it has zero norm and by 
IP3 it is the zero vector. In other words, A’(v) = A(v) for all v, which means 
that A= A’. 

Given a linear transformation A : Y — Y we define its adjoint relative 
to the inner product, as the linear transformation At : Y — V with matrix 
elements 


(Al(v), w) = (v, A(w)) . (1.41) 


The adjoint operation obeys several properties. First of all, taking adjoint 
is an involution: 
aaa 
Moreover it is a linear operation 
QAA+pB)'=dAT+ pB", (1.43) 
which reverses the order of a composition: 
(Ao B} =B'o A. (1.44) 


These properties are easily proven. The method of proof consists in showing that both 
sides of each equation have the same matrix elements. For example, the matrix elements 
of the double adjoint Ati are given by 


(AT (v), w) = (v, AT(w)) (by equation (1.41)) 
= (A (w), v) “ a 
= (w, A(v)) (by equation (1.41)) 
Er w); = IP) 


whence they agree with the matrix elements of A. 


Similarly, the matrix elements of (A A + u B)? are given by 


(A A + u B)} (v), w) = (v, (AA + u B)(w)) (by equation (1-47) 
=X (v, A(w)) + u (v, B(w)) (using IP2) 
= (At (v), w) + (BT (v), w) (by equation (1-41)) 
= (AAT + u B’) (v), w) , (using IP1,2) 


which agree with the matrix elements of \ At + u BË. 
Finally, the matrix elements of (Ao B)* are given by 


((Ao B)'(v), w) = (v, (Ao B)(w)) (by equation (LAT) 
= (v, A(B(w))) (by equation (.2)) 
= (AÏ (v), B(w)) (by equation (L-4i)) 
= (Bİ (AÏ (v)), w) (by equation (@.41)) 
= ((BÝ o At) (v), w) , (by equation (1.2)) 


which agree with the matrix elements of Bt o At. 
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A linear transformation is said to be symmetric if At = A. It is said to be 
orthogonal if Ao A = Ao At = 1. In particular, orthogonal transformations 
preserve inner products: 


(A(v), A(w)) = (wv, AT(A(w))) (by equation (L41)) 
= (v, (Alo A)(w)) (by equation (1.2)) 
= (v, w). (since A is orthogonal) 


Notice that in the above we only used the condition Ato A = 1 but not Ao At = 1. In 
a finite-dimensional vector space one implies the other, but in infinite dimensional vector 


spaces it may happen that a linear transformation which preserves the inner product obeys 
Ato A = 1 but does not obey Ao AŤ = 1. (Maybe an example?) 

To justify these names, notice that relative to an orthonormal basis the 
matrix of a symmetric transformation is symmetric and the matrix of an 
orthogonal transformation is orthogonal, as defined in Section [L2.7] This 
follows because the matrix of the adjoint of a linear transformation is the 
transpose of the matrix of the linear transformation. 

Let us prove this. Let {e;} be an orthonormal basis and let A: Y — V be 
a linear transformation. The matrix A of A relative to this basis has entries 
A;j defined by 


N 
j=l 


The entries A;; are also given by matrix elements: 


(A(ei), ej) = (X_ Ari ex, €;) 


N 

= N Ani (ek, €j) (using IP1,2) 
k=1 

= Aji . (using equation (L.38)) 


In other words, relative to an orthonormal basis, we have the following useful 
formula: 


Au = (e:, Ale;)) (1.45) 


From this it follows that the matrix of the adjoint A! relative to this basis 
is given by A‘. Indeed, 


= (ej, A(e;)) (using equation (L.41)) 
= (A(e;), e;) (using IP1) 
=A 
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Therefore if At = A, then A = A‘, and the matrix is symmetric. Similarly, if 
Ao Al = Alo A=1, then At A = A At = |, and the matrix is orthogonal. 

Notice that equations (1.42), and (1.44) for the linear transforma- 
tions are now seen to be consequences of equations (1.28), (1.31) and (£30) 
applied to their matrices relative to an orthonormal basis. 


1.3.5 Complex vector spaces 


Much of what we have been saying about vector spaces remains true if we 
substitute the scalars and instead of real numbers consider complex numbers. 
Only the notion of an inner product will have to be changed in order for it to 
become useful. Inner products on complex vector spaces will be the subject 
of the next section; in this one, we want to emphasise those aspects of vectors 
spaces which remain unchanged when we extend the scalars from the real to 
the complex numbers. 

As you know, complex numbers themselves can be understood as a real 
vector space of dimension two; that is, as R?. If z = x+y is a complex 
number with x,y real and i = /—1, then we can think of z as the pair 
(x,y) € R?. Addition of complex numbers corresponds to vector addition in 
R°. Indeed, if z=a2+iy and w=utiv then z+w=(x+u)+i(y+v), 
which is precisely what we expect from the vector addition (x, y) + (u,v) = 
(c+u,y+v). Similarly, multiplication by a real number A corresponds to 
scalar multiplication in R°. Indeed, A z = (Ax)+7 (Ay), which is in agreement 
with A (x,y) = (Az, Ay). However the complex numbers have more structure 
than that of a mere vector space. Unlike vectors in a general vector space, 
complex numbers can be multiplied: if z = x + iy and w = u + iv, then 
zw = (xu — yv) + i (xv + yu). Multiplication is commutative: wz = zw. 


In a sense, complex numbers are more like matrices than vectors. Indeed, consider the 
2 x 2 matrices of the form 


a —b 
b a 
If we take the matrix product 
z -y u —v _ gu-—yv —(av+ yu) 
y x v u — xw+yu gzu— yv ’ 


we see that we recover the multiplication of complex numbers. Notice that the complex 
number 7 is represented by the matrix 


0 -1 
a= 1 0’ 
which obeys J? = —I. 
A real matrix J obeying J? = —I is called a complex structure. 
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We now briefly review some basic facts about complex numbers. Although 
you should be familiar with the following concepts, I will briefly review them 
here just to set the notation. As we have seen complex number can be added 
and multiplied. So far that is as with the real numbers, but in addition 
there is a notion of complex conjugation: z = xz +iy |> 2* = x — iy. 
Clearly conjugation is an involution: (z*)* = z. It also obeys (zw)* = z*w*. 
A complex number z is said to be real if it is invariant under conjugation: 
z* = z. Similarly a complex number is said to be imaginary if z* = —z. 
Given z = x + iy, z is real if and only if y = 0, whereas z is imaginary if 
and only if x = 0. If z = x + iy, x is said to be the real part of z, written 
x = Rez, and y is said to be the imaginary part of z, written y = Imz. 
Notice that the imaginary part of a complex number is a real number, not 
an imaginary number! Given a complex number z, the product zz* is real: 
(zz*)* = z2*. It is written |z|? and it is called the modulus of z. If z = x+iy, 
then |z|? = 2? + y?, which coincides with the squared norm ||(z, y) ||? of the 
corresponding vector in the plane. Notice that the modulus is multiplicative: 
|zw| = |z||w| and invariant under conjugation: |z*| = |z|. 

After this flash review of complex numbers, it is possible to define the 
notion of a complex vector space. There is really very little to do. Every- 
thing that was said in Sections [LI] and [1-2] still holds provided we replace 
real with complex everywhere. An abstract complex vector space satisfies 
the same axioms, except that the scalars are now complex numbers as op- 
posed to real numbers. Vector subspaces work the same way. Bases and 
linear independence also work in the same way, linear combinations be- 
ing now complex linear combinations. The canonical example of a com- 
plex vector space is C™, the set of ordered N-tuples of complex numbers: 
(21, 22,-..,2n), with the operations defined slot-wise as for RY. The canoni- 
cal basis {(1,0,...,0), (0,1,...,0),...,(0,0,...,1)} still spans C7, but where 
we now take complex linear combinations. As a result C™ has (complex) 
dimension N. If we only allowed ourselves to take real linear combina- 
tions, then in order to span C we would need in addition the N vec- 
tors {(7,0,...,0), (0,7,...,0),...,(0,0,...,2)}, showing that as a real vector 
space, C™ is 2N-dimensional. 

Linear maps and linear transformations are now complex linear and ma- 
trices and column vectors now have complex entries instead of real entries. 
Matrix invariants like the trace and the determinant are now complex num- 
bers instead of real numbers. There is one more operation we can do with 
complex matrices, and that is to take complex conjugation. If A is a complex 
N x M matrix, then A* is the N x M matrix whose entries are simply the 
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complex conjugates of the entries in A. Clearly, for square matrices, 
det(A*) = (det A)* and tr(A*) = (trA)* . 


The only significant difference between real and complex vector spaces is 
when we introduce inner products, which we do now. 


1.3.6 Hermitian inner products 


We motivated the introduction of inner products as a way to measure, in 
particular, lengths of vectors. The need to compute lengths was motivated 
in turn by the fact that the vectorial quantities used in physics have a mag- 
nitude as well as a direction. Magnitudes, like anything else that one ever 
measures experimentally, are positive (or at least non-negative) real num- 
bers. However if were to simply extend the dot product from RY to C%, we 
would immediately notice that for z = (2, z2,...,2N) € C7, the dot product 


with itself 
N 
Z- z= Ss Riri 5 
i=1 


gives a complex number, not a real number. Hence we cannot understand 
this as a length. One way to generate a positive real number is to define the 
following inner product on C%: 


N 
x 
(z, w) = J ZŠ Wi , 
i=l 
where z = (21, 22,..., Zy) and w = (w1, W2,...,Wy). It is then easy to see 


that now 


so that this is a non-negative real number, so that it can be interpreted as 
a norm. The above inner product obeys the following property, in contrast 
with the dot product in RY: it is not symmetric, so rather than IP1 it obeys 
(Zt = (w, z)". 

This suggests the following definition. A complex valued function (-,-) : 
V x Y — C taking pairs of vectors to complex numbers is called a hermitian 
inner product if the following axioms are satisfied: 


HIP1 (z, w) = (w, z)*; 


HIP2 (x, Az + uw) = à (z, z) + u(x, w); and 
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HIPS. ||z|? = (z, z) > 0 for all z £ 0, 


where here À and u are complex scalars. 
Except for the fact that (-,-) is a complex function, the only obvious 
difference is HIP1. Using HIP1 and HIP2 we see that 


(Az + uw, g£) = (£, àz + pw)" (by HIP1) 
= (A (æ, z) + u (£, w)) (by HIP2) 
= X (æ, 2)" + p’ (z, w)* 
= Xr" (z, x) + u (w, £), (using HIP1) 


so that (-,-) is complex linear in the second slot but only conjugate linear 
in the first. One says that hermitian inner products are sesquilinear, which 
means ‘one and a half’ linear. 

Just as in the real case, the inner product of any two vectors is determined 
by the matrix of inner products relative to any basis. Let {e;} be a basis for 
V. Let v = J^ vje; and w = 33, wie; be any two vectors. Then their 
inner product is given by 


N N 
(v,w) = >D Vi €i, X U; ej} 
i=l j=l 


N 

= ` UF wj (ei, €j}. (using HIP1,2) 
i j=1 

In other words, all we need to know in order to compute this are the complex 

numbers H;; := (e;,e;), which can be thought of as the entries of a matrix H. 

If we think of v as a column vector v in C™ whose entries are the components 

of v relative to the basis {e;}, and the same for w, we can compute their 

inner product using matrix multiplication: 


(v, w) = (v* Hw. 


We saw in the real case that the analogues matrix there was symmetric 
and positive-definite, reflecting the similar properties of the inner product. 
In the complex case, we expect that H should still be positive-definite but 
that instead of symmetry it should obey a property based on HIP1. Indeed, 
it follows from HIP1 that 


Ay = (ei, €;) = epey = Hj; . 
This means that the matrix H is equal to its conjugate transpose: 


H = (H*)* . (1.46) 
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Such matrices are called hermitian. Property HIP3 means that H is positive- 
definite, so that in particular it is non-degenerate. 

Let us see how H transforms under a change of basis. Let {e{} be a new 
basis, with e; = S(e;) for some complex linear transformation S. Relative 
to {e;} the linear transformation S is represented by a matrix S with entries 
Sji given by equation (1.21). Let H’ denote the matrix describing the inner 
product in the new basis: its entries Hj; are given by 

Hi, = (el, e) (by definition) 


N N 
= o> Ski €k, `> Sij €) (by equation (-Z21)) 
I= 


k=1 


42 


Ski Sty (€k, €1) (using HIP1,2) 
k,l=1 
N 
= ` Sr Ay Sij š 
kl=1 


In other words, 
H = (S*)* HS , (1.47) 
to be contrasted with the analogous formula (1.35). 
The Cauchy—Schwartz and triangle inequalities are still valid for her- 
mitian inner products. The proofs are essentially the same as for the real 
case. We will therefore be brief. 


In order to prove the Cauchy—Schwarz inequality, we start the following 
inequality, which follows from HIP3, 


Iv —Aw||? >0, 


and choose A € C appropriately. Expanding this out using HIP1 and HIP2 
we can rewrite it as 


lel? + [AP lwll? — A (v, w) — à* (w,v) > 0. 
Hence if we choose A = (w, v)/||w]|?, we turn the inequality into 


(vw)? 


lvl? — 
lw]? 


— ? 


which can be rewritten as 


I(v, w)? < llel? lwl? . 


47 


Taking square roots (all quantities are positive) we obtain the Cauchy- 
Schwarz inequality (L.36). 
In order to prove the triangle inequality, we start with 


lv + wll? = (w +w, v + w) 


= |lvll? + |w]? + 2Re(v, w) 

< Jlo]? + lwl? + 2v, w)| (since Rez < |z| Vz € ©) 
< llel? + lwl? + 2llvl] [leo (by Cauchy-Schwarz) 
= (lloll + lwl)” ; 


whence taking square roots we obtain the triangle inequality (0.37). 
The complex analogue of an orthonormal basis is a unitary basis. Explic- 
itly, a basis {e;} is said to be unitary if 
1 aie 9 


0 otherwise. 


The components of a vector v relative to a unitary basis {e;} can be com- 
puted by taking inner products, just as in the real case. Let v = paw Vi €i, 
and take its inner product with e;: 


(ej, v} = (ej, a v; ei) 


N 

E `> vi (ej, €i) (using HIP2) 
i=1 

Ur (using equation (1.48) ) 


This shows that unitary vectors are automatically linearly independent. 
One still has the Gram-Schmidt procedure for hermitian inner products. 

It works essentially in the same way as in the real case, so we will not spend 

much time on this. Consider a basis {f;} for Y. Define the following vectors: 


= f= Dialer fi) ej l 
If; — Eilen esl 


It is easily checked that they are a unitary basis. First of all each e; is 
clearly normalised, because it is defined as a vector divided by its norm; and 
moreover if 7 > j, then e; is clearly orthogonal to e;. 

Finally, we discuss the adjoint of a complex linear map relative to a 
hermitian inner product. Let A : VY —> Y be a complex linear map. We 


i 
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define its adjoint At by a Ti now (-,-) is a hermitian inner 
product. The properties (0.42 TI) ana (CE 4) still hold, and are proven in exactly 
the same way. 

Only property (£48) changes, reflecting the sesquilinear nature of the 
inner product. Indeed notice that 


(AA +p B)' v, w) = (v,(AA+ u B) w) (by @.41)) 
= à (v, Aw) + u (v, B w) (by HIP2) 
= (Atv, w) + u (Bt v, w) (by @.41)) 
= (à* (w, At v) +p" (w, B* v) (by HIP1) 

= (w, (A* AT + p* BT) v)* (by HIP2) 
= ((\* Al + p* BT) v,w) ; (by HIP1) 


whence 
(AA+pB)! =) Al + p* B‘. (1.49) 


A complex linear transformation A is said to be hermitian if AT = A, 
and it is said to be anti-hermitian (also skew-hermitian) if At = —A. As in 
the real case, the nomenclature can be justified by noticing that the matrix 
of a hermitian transformation relative to a unitary basis is hermitian, as 
defined in equation (1.46). The proof is similar to the proof of the analogous 
statement in the real case. Indeed, 


Al, = (Al(e;), ei) (by equation (1.45) 
= (ej, A(ei)) (using equation (L41) 
= (A(ei), ej)” (using HIP1 
= Aji. (by equation (1.45) 


Therefore if At = A, then A = (A*)', and the matrix is hermitian. Notice 
that if A is a hermitian matrix, then 7A is antihermitian, hence unlike the 
real case, the distinction between hermitian and anti-hermitian is trivial. 

Let us say that a linear transformation U is unitary if U'oU = UoUt = 1. 
In this case, its matrix U relative to a unitary basis obeys (U*)’ U = U (U*)! = 
|. This means that the conjugate transpose is the inverse, 


) 
) 
) 
)) 


OS Sar (1.50) 


Not surprisingly, such matrices are called unitary. Finally let us notice that 
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just as in the real case, a unitary transformation preserves the inner product: 


(U(v),U(w)) = (v, U'(U(w))) (by equation (L41)) 
= (v, (Ut oU)(w)) (by equation (L.2)) 
= (v, w). (since U is unitary) 


1.4 The eigenvalue problem and applications 


In this section we study perhaps the most important aspect of linear algebra 
from a physical perspective: the so-called eigenvalue problem. We mentioned 
when we introduced the notion of a basis that a good choice of basis can often 
simplify the solution of a problem involving linear transformations. Given 
a linear transformation, it is hard to imagine a better choice of basis than 
one in which the matrix is diagonal. However not all linear transformations 
admit such a basis. Understanding which transformations admit such basis 
is an important part of linear algebra; but one whose full solution requires 
more machinery than the one we will have available in this course. We will 
content ourselves with showing that certain types of linear transformation of 
use in physics do admit a diagonal basis. We will finish this section with two 
applications of these results: one to mathematics (quadratic forms) and one 
to physics (normal modes). 


1.4.1 Eigenvectors and eigenvalues 


Throughout this section V shall be an N-dimensional complex vector space. 
Let A : Y — V be a complex linear transformation. Let v € V be a 
nonzero vector which obeys 


Av=xXv for some À € C. (1.51) 


We say that v is an eigenvector of A with eigenvalue à. Let {e;} be a basis 
for Y. Let v be the column vector whose entries are the components v; of v 
relative to this basis: v = 5°, v,e;; and let A be the matrix representing A 
relative to this basis. Then equation becomes 


Av=Xv. (1.52) 


Rewriting this as 
(A—Al)v=0, 


we see that the matrix A — À I annihilates a nonzero vector, whence it must 
have zero determinant: 


det (A— Al) =0. (1.53) 
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Let À be an eigenvalue of A. The set of eigenvectors of A with eigenvalue 
A, together with the zero vector, form a vector subspace V) of Y, known as 
the eigenspace of A with eigenvalue A. 


It is easy to prove this: all one needs to show is that V) is closed under vector addition 
and scalar multiplication. Indeed, let v and w be eigenvectors of A with eigenvalue and 
let a, B be scalars. Then 


A(av+ Bw) =aA(v) + 8 A(w) (by L1,2) 
=a\v+ bw (by equation (51) 
=A (av + Bw) ’ 


whence av + 3 w is also an eigenvector of A with eigenvalue A. 


That VY) is a subspace also follows trivially from the fact that it is the kernel of the linear 
transformation A — À 1. 


The dimension of the eigenspace Y, is called the multiplicity of the eigen- 
value A. One says that an eigenvalue À is non-degenerate if V) is one- 
dimensional and degenerate otherwise. 

A linear transformation A : Y — Y is diagonalisable if there exists a 
basis {e;} for Y made up of eigenvectors of A. In this basis, the matrix A 
representing A is a diagonal matrix: 


where not all of the A; need be distinct. In this basis we can compute the 
trace and the determinant very easily. We see that 


N 
tr(A) = Ar A+ HANEY A 
i=1 


N 
det(A) = Ag+ Aw = | [A 
i=1 


Therefore the trace is the sum of the eigenvalues and the determinant is 
their product. This is independent of the basis, since both the trace and the 
determinant are invariants. 


© This has a very interesting consequence. Consider the identity: 
N N 
Il exp(A;) = exp 5 x) ; 
i=1 i=1 
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We can interpret this identity as an identity involving the diagonal matrix A: 
det (exp(A)) = exp (tr(A)) , 


where the exponential of a matrix is defined via its Taylor series expansion: 


co 
P(A) SIHA EA HARE DO RA", 
n=1 


so that for a diagonal matrix, it is simply the exponential of its diagonal entries. Now 
notice that under a change of basis given by At A’, where A’ is given by equation (1.24), 


exp(A’) = S> 4 (A’)” 
n=1 
=J 4 (S7 AS)” (by equation (1.24)) 
n=1 
=Soas tas 
n=1 
= S7! exp(A)S; 


whence because the trace and determinant are invariants 
det exp(A’) = exp tr(A’) 
Hence this equation is still true for diagonalisable matrices. In fact, it follows from the 


fact (see next section) that diagonalisable matrices are dense in the space of matrices, that 
this identity is true for arbitrary matrices: 


det (exp(A)) = exp (tr(A)) . (1.54) 


This is an extremely useful formula, particularly in quantum field theory and statistical 
mechanics, where it is usually applied to define the determinant of infinite-dimensional 
matrices. 


1.4.2 Diagonalisability 


Throughout this section V is an N-dimensional complex vector space. 

It turns out that not every linear transformation is diagonalisable, but 
many of the interesting ones in physics will be. In this section, which lies 
somewhat outside the main scope of this course, we will state the condition 
for a linear transformation to be diagonalisable. 

Fix a basis for V and let A be the matrix representing A relative to this 
basis. Let us define the following polynomial 


xa(t) = det (A—tl) , (1.55) 
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known as the characteristic polynomial of the matrix A. Under a change of 
basis, the matrix A changes to the matrix A’ given by equation (1.24). The 
characteristic polynomial of the transformed matrix A’ is given by 


Xa (t) = det (A’ — t1) 


= det (S71 AS — tl) (by equation (1.24) ) 

= det (S~' (A — tI) S) (since SHIS = 1) 

= det (S7 D det (A — tl) det (S) (by equation (L.18)) 
1 

= det (5) xa(t) det (S) 

= xa(t) . 


In other words, the characteristic polynomial is a matrix invariant and hence 
is a property of the linear transformation A. We will therefore define the 
characteristic polynomial x4(t) of a linear transformation A: VY — Y as the 
polynomial y(t) of the matrix which represents it relative to any basis. By 
the above calculation it does not depend on the basis. 

The characteristic polynomial is a polynomial of order N where N is the 
complex dimension of Y. Its highest order term is of the form (—1)%t% and 
its zeroth order term is the determinant of A, as can be seen by evaluating 
ya(t) at t = 0. In other words, 


ya(t) = det(A) + +--+ (-1)%t* . 


Equation (1.53) implies that every eigenvalue A of A is a root of its char- 
acteristic polynomial: y,4(A) = 0. Conversely it is possible to prove that 
every root of the characteristic polynomial is an eigenvalue of A; although 
the multiplicities need not correspond: the multiplicity of the eigenvalue is 
never larger than that of the root. 

This gives a method to compute the eigenvalues and eigenvectors of a 
linear transformation A. We simply choose a basis and find the matrix A 
representing A. We compute its characteristic polynomial and find its roots. 
For each root A we solve the system of linear homogeneous equations: 


(A—Al)v=0. 


This approach rests on the following general fact, known as the Funda- 
mental Theorem of Algebra: every complex polynomial has a root. In fact, 
any complex polynomial of order N has N roots counted with multiplic- 
ity. In particular, the characteristic polynomial factorises into a product of 
monomials: 


Pa) = rae Apt) Apa as 
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where all the A; are distinct and where m; > 1 are positive integers. Clearly 
each A; is a root and m; is its multiplicity. Each A; is also an eigenvalue of 
A, but m; is not necessarily the multiplicity of the eigenvalue A;. Consider 


the matrix 
l a 
Re 


where a # 0 is any complex number. Its characteristic polynomial is given 
by 


a 


T SAA 2 = Fo oe 


| =(1-#)? =1-2t+?. 
Hence the only root of this polynomial is 1 with multiplicity 2. The number 
1 is also an eigenvalue of A. For example, an eigenvector v is given by 


v= (2). 


However, the multiplicity of the eigenvalue 1 is only 1. Indeed, if it were 2, 
this would mean that there are two linearly independent eigenvectors with 
eigenvalue 1. These eigenvectors would then form a basis, relative to which 
A would be the identity matrix. But if A = | relative to some basis, A’ = | 
relative to any other basis, since the identity matrix is invariant under change 
of basis. This violates the explicit expression for A above. 

A result known as the Cayley—Hamilton Theorem states that any matrix 
A satisfies the following polynomial equation: 


xa(A) =0 ’ 


where 0 means the matrix all of whose entries are zero, and where a scalar 
a is replaced by the scalar matrix al. For example, consider the matrix A 
above: 


ya(A) =1—2A+ A? 


The Cayley-Hamilton theorem shows that any N x N matrix obeys an 
N-th order polynomial equation. However in some cases an N x N matrix 
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A obeys a polynomial equation of smaller order. The polynomial jua(t) of 
smallest order such that 

Ua(A) =0 ’ 
is called the minimal polynomial of the matrix A. One can show that the 
minimal polynomial divides the characteristic polynomial. In fact, if the 
characteristic polynomial has the factorisation 


XAG) = r= Og =O) eet) 
the minimal polynomial has the factorisation 
ba(t) = (Ar — 8)" (Ag — 8) + AR — 8)" , 


where 1 < n; < m;. The main result in this topic is that a matrix A is diag- 
onalisable if and only if all n; = 1. For the non-diagonalisable matrix above, 
we see that its characteristic polynomial equals its minimal polynomial, since 
ASAI. 

In particular this shows that if all eigenvalues of a linear transformation 
are non-degenerate, then the linear transformation is diagonalisable. Given 
any matrix, one need only perturb it infinitesimally to lift any degeneracy its 
eigenvalues might have. This then implies that the diagonalisable matrices 
are dense in the space of matrices; that is, infinitesimally close to any non- 
diagonalisable matrix there is one which is diagonalisable. This is key to 
proving many identities involving matrices. If an identity of the form f(A) = 
0 holds for diagonalisable matrices then it holds for any matrix provided that 
f is a continuous function. 

Computing the minimal polynomial of a linear transformation is not an 
easy task, hence it is in practice not very easy to decide whether or not a 
given linear transformation is diagonalisable. Luckily large classes of linear 
transformations can be shown to be diagonalisable, as we will now discuss. 


1.4.3 Spectral theorem for hermitian transformations 


Throughout this section V is an N-dimensional complex vector space with a 
hermitian inner product (-,-). 

Let A: V — Y be a hermitian linear transformation: At = A. We will 
show that it is diagonalisable. As a corollary we will see that unitary trans- 
formations U : V — Y such that Ut o U = U oUt = 1 are also diagonalisable. 
These results are known as the spectral theorems for hermitian and unitary 
transformations. 

We will first need to show two key results about the eigenvalues and eigen- 
vectors of a hermitian transformation. First we will show that the eigenvalues 
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of a hermitian transformation are real. Let v be an eigenvector of A with 
eigenvalue A. Then on the one hand, 


(A(v),v) = (Av, v) 
=A" (v, v) ; (by sesquilinearity) 


whereas on the other hand, 


(A(v), v) = w, A'(v)) (by equation (L.41)) 
= (v, A(v)) (since A is hermitian) 
= (v, rv) 


=X(v,v) . (by HIP2) 
Hence, 
(A— 2") llel? = 0. 


Since v 4 0, HIP3 implies that ||v||? 4 0, whence A = à*. 

The second result is that eigenvectors corresponding to different eigenval- 
ues are orthogonal. Let v and w be eigenvectors with distinct eigenvalues A 
and u, respectively. Then on the one hand, 


(A(v), w) = (Av, w) 


=X(v,w) . (since A is real) 
On the other hand, 
(A(v), w) = (w, Al(w)) (by equation (L.41)) 
= (v, A(w)) (since A is hermitian) 
(v, u w) 
= u({v, w) . (by HIP2) 
Hence, 
(A ~~ Lt) (v, w) ’ 


whence if A# pw, v Lw. 
Now we need a basic fact: every hermitian transformation has at least 
one eigenvalue. 


© This can be shown using variational calculus. Consider the expression 


f(v) = (v, A(v)) . 
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We claim that f(v) is a real number: 


f(v)* = (v, A(w))* 
= (A(v), v) (by HIP1) 
= (v, At (v)) (by equation (@.41)) 
(v, A(v)) (since A is hermitian) 
= f(v) 


Therefore f defines a continuous quadratic function from Y to the real numbers. We would 
like to extremise this function. Clearly, 


flav) = al? fw) , 


and this means that by rescaling v we can make f(v) be as large or as small as we want. 
This is not the type of extremisation that we are interested: we want to see in which 
direction is f(v) extremal. One way to do this is to restrict ourselves to vectors such that 
||v|/? = 1. This can be imposed using a Lagrange multiplier A. Extremising f(v) subject 
to the constraint ||v||? = 1, can be done by extremising the expression 


I(v,) = f(v) -à (lloll? — 1). 
The variation of I yields the following expression: 
ôI = 2 (v, (A — Al) v) — ôA (|lvl|? — 1) . 
Therefore the variational equations are ||v||? = 1 and 
Av=Av, 


where we have used the non-degeneracy of the inner product and the fact that we want 
ôI = 0 for all 6A and 6v. Therefore this says that the extrema of I are the pairs (v, A) 
where v is a normalised eigenvalue of A with eigenvalue à. The function I(v, A) takes the 
value I(v, A) = À at such a pair; whence the maxima and minima correspond to the largest 
and smallest eigenvalues. It remains to argue that the variational problem has solution. 
This follows from the compactness of the space of normalised vectors, which is the unit 
sphere in Y. The function f(v) is continuous on the unit sphere and hence attains its 
maxima and minima in it. 


We are now ready to prove the spectral theorem. We will first assume 
that the eigenvalues are non-degenerate, for ease of exposition and then we 
will relax this hypothesis and prove the general result. 

Let vı be a normalised eigenvector of A with eigenvalue \,. It exists 
from the above discussion and it is the only such eigenvector, up to scalar 
multiplication, by the non-degeneracy hypothesis. The eigenvalue is real as 
we saw above. Choose vectors {e2,e3,...} such that {v1,e2,...} is a basis 
for Y and apply the Gram-Schmidt procedure if necessary so that it is a 
unitary basis. Let us look at the matrix A of A in such a basis. Because e, 
is an eigenvector, one has 


(A(v1), ej) = (Ar v1, €;) = Ar (V1, €;) = 0, 
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and similarly 
(A(e;), v1) = (e;, A(vi)) = (ej, Ar V1) = Ai (ej, 01) = 0. 


Moreover 
(vı, A(v1)) = (v1, Ar v1) = Ai lleill? =A1 


This means that the matrix takes the form 


` 0 > 0 
0 A» +++ Aon 
. A 4 ; (1.56) 
0 Ayo © Ann 
The submatrix 
Axo Aon 
An2 > Ann 
is still hermitian, since for i, j = 2,..., N, 


Aij = (ei, A(e;)) = (A(e;), ex)” = (ej, Alei))” = Aj - 


Now we can apply the procedure again to this (N — 1) x (N — 1) matrix: 
we find a normalised eigenvector v2, which by assumption corresponds to 
a non-degenerate eigenvalue Aj. Starting with this eigenvector we build a 
unitary basis {v2, e5,...} for the (N — 1)-dimensional subspace spanned by 
the {e2, e3,...$. The submatrix AÙ} then takes the form analogous to the 
one in equation (1.56), leaving an (N — 2) x (N — 2) submatrix which is again 
still hermitian. We can apply the same procedure to this smaller matrix, and 
so on until we are left with a 1 x 1 hermitian matrix, i.e., a real number: Aj. 
The basis {v;} formed by the eigenvectors is clearly unitary, since each v; 
is normalised by definition and is orthogonal to the preceding {v;<;} by the 
way they were constructed. The matrix of A relative to this basis is then 


with real eigenvalues A;. 
The case with degenerate eigenvalues works along similar lines. We start 
with an eigenvalue A; and consider the eigenspace V),. It may be that the 
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dimension mı of Y), is larger than 1. In any case, every vector in Y), is an 
eigenvector of A. Use Gram-Schmidt to find a unitary basis {v1, V2,...,Um,} 
for Ya. Complete this basis to a unitary basis {v1,...,Um,;@m,41;---,en} 
for Y, which can be done using Gram-Schmidt if necessary again. The matrix 
A representing A in this basis is given by 


A= ; 


mitlmitl `? AÅmı+1,N 


ÅNm+1 > ANN 
where the off-diagonal blocks have vanishing entries because 
(ei, A(v;)) = M1 (ei, vj) =(0. 


The submatrix 
Aric ira 41 yay Attn 


Anmti © ANN 


is again hermitian, so we can apply the procedure again to it, until we are 
left with a basis {v;} of eigenvectors of A, so that the matrix is diagonal. 

In summary, suppose that we start with a hermitian matrix A, thought 
of as the matrix of a hermitian linear transformation A relative to a unitary 
basis. Then the above iterative procedure produces a unitary basis relative 
to which the matrix for A is diagonal. Because the initial and final basis 
are unitary, the change of basis transformation U is unitary. In other words, 
given a hermitian matrix A there is a unitary matrix U such that 


AP =U AU = (U")' AU 


is diagonal. In other words, 


Every hermitian matrix can be diagonalised by a unitary transformation. 


In fact the unitary matrix U above can be written down explicitly in 
terms of the normalised eigenvectors of A. Let {v;} be a set of normalised 
eigenvectors which are mutually orthogonal. This is guaranteed if they have 
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different eigenvalues, and in the case of degenerate eigenvalues by Gram- 
Schmidt. Consider the matrix 


E pote $ 
U= [|v vo = VN 


We claim first of all that U is unitary. Indeed, 


— 
a 


Hence 


T1 
—>~ 
< 
= 
a 


as 
Cc 
* 
wn’ 
œ 
Cc 
| 
< 
m 
< 
N 
< 
2 


since the {v;} form a unitary basis. Moreover, paying attention to the way 
matrix multiplication is defined and using that the {v;} are eigenvectors of 
A, we find 


AU= Ai Vi A2 V2 F An VN 
Ài 
= V1 V2 eee VN 
UAT, 


In other words, A’ = UT! AU just as above. 
There is a real version of this result: if A is real and symmetric, then it 
is hermitian. We can diagonalise it with a unitary transformation, which is 
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also real, whence it is orthogonal. This then yields the spectral theorem for 
symmetric real matrices, which says that any real symmetric matrix can be 
diagonalised by an orthogonal transformation. 

We can now understand the positive-definiteness condition on the matrix 
representing a inner product on a vector space. We saw in Section [L.3.I] that 
the matrix G of an inner product in a real vector space is symmetric. Hence it 
can be diagonalised by an orthogonal transformation. From equation (1.35), 
it follows that there is a basis relative to which the matrix of the inner product 
is diagonal. Let {e;} be such a basis and let (e;,e;) = A; diz. If v = DO, vie; 
is any vector, then 


lel? = (v, v) = Do Aw? . 


Axiom IP3 says that this quantity has to be positive for all nonzero vectors 
v, which clearly implies that A; > 0 for all i. Therefore a symmetric matrix 
is positive definite if and only if all its eigenvalues are positive. A similar 
statement also holds for hermitian inner products, whose proof is left as an 
exercise. 


It is not just hermitian matrices that can be diagonalised by unitary transformations. Let 
us say that a linear transformation N is normal if it commutes with its adjoint 


NioN=NoN'. (1.57) 


Then it can be proven that a N is diagonalisable by a unitary transformation. As an 
example consider the 3 x 3 matrix 


considered in the Exercises. We saw that it was diagonalisable by a unitary transformation, 
yet it is clearly not hermitian. Nevertheless it is easy to check that it is normal. Indeed, 


0 0 4 
P*t=|1 0 0], 
01 0 


peT epy eai 


so that 


in other words, it is unitary. 


It follows from the spectral theorem for hermitian transformations that 
unitary transformations can also be diagonalised by unitary transformations. 
This is known as the Cayley transformation, which is discussed in detail in 
the Problems. It follows from the Cayley transformation that the eigenvalues 
of a unitary matrix take values in the unit circle in the complex plane. This 
can also be seen directly as follows. Let U be a unitary transformation and 
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\|?. Because 


let v be an eigenvector with eigenvalue A. Then consider ||U (v) 
U is unitary, 


lUw)? = llel? , 


but because v is an eigenvector, 
Ue)? = Av]? = JAP lel? , 


whence |A|? = 1. 

Spectral theorems are extremely powerful in many areas of physics and 
mathematics, and in the next sections we will discuss two such applications. 
However the real power of the spectral theorem manifests itself in quantum 
mechanics, although the version of the theorem used there is the one for 
self-adjoint operators in an infinite-dimensional Hilbert space, which we will 
not have the opportunity to discuss in this course. 


1.4.4 Application: quadratic forms 


In this section we discuss a mathematical application of the spectral theorem 
for real symmetric transformations. 

Let us start with the simplest case of a two-dimensional quadratic form. 
By a quadratic form on two variables (71,72) we mean a quadratic polyno- 
mial of the form 

Q(21, £2) = ax? + 2bx1£ + cx? , (1.58) 


for some real constants a,b,c. By a quadric we mean the solutions (xj, £2) 
of an equation of the form 


Q(z1,%2) =d, 


where d is some real number and Q is a quadratic form. For example, we 
can take 


Qi(@1, 22) = 27 +25, 


in which case the quadrics Q1(21, £2) = d for d > 0 describe a circle of radius 
Vd in the plane coordinatised by (21,72). To investigate the type of quadric 
that a quadratic form gives rise to, it is convenient to diagonalise it: that it, 
change to coordinates (y1, y2) for which the mixed term yj y2 in the quadratic 
form is not present. To tie this to the spectral theorem, it is convenient to 
rewrite this in terms of matrices. In terms of the column vector x = (21, £2), 
the general two-dimensional quadratic form in equation (L58) can be written 
as 


Q(a1,%2) = X Qx, 
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where Q is the matrix 
a b 
a= (5) 


Because Q is symmetric, it can be diagonalised by an orthogonal transfor- 
mation which is built out of the normalised eigenvectors as was explained 
in the previous section. Hence there is an orthogonal matrix O such that 
Q = OD Ot, where D is a diagonal matrix with entries \;, for i = 1,2. That 
means that in terms of the new coordinates 


the quadratic form is diagonal 


Qly y2) = yf Dy = 1 y] + à2 92 - 
We can further rescale the coordinates {y;}: zi = Hiyi, where ju; is real. This 
means that relative to the new coordinates z;, the quadratic form takes the 
form 
Q(z, z2) z= Ez? ot E2 yz , 


where g; are 0, +1. 
We can distinguish three types of quadrics, depending on the relative 
signs of the eigenvalues: 


1. (€1€2 = 1) In this case the eigenvalues have the same sign and the 
quadric is an ellipse. 


2. (€1€2 = —1) In this case the eigenvalues have different sign and the 
quadric is a hyperbola. 


3. (€1€2 = 0) In this case one of the eigenvalues is zero, and the quadric 
consists of a pair of lines. 


The general case is not much more complicated. Let V be a real vector 
space of dimension N with an inner product. By a quadratic form we mean 
a symmetric bilinear form Q : V x Y — R. In other words, Q satisfies axioms 
IP1 and IP2 of an inner product, but IP3 need not be satisfied. Associated 
to every quadratic form there is a linear transformation in V, which we also 
denote Q, defined as follows 


(v, Q(w)) = Qw, w) . 
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Symmetry of the bilinear form implies that the linear transformation Q is 
also symmetric: 


(v, Q(w)) = Qw, w) = Q(w, v) = (w, Q(v)) = (Qw), w) . 


Hence it can be diagonalised by an orthogonal transformation. Relative to 
an orthonormal basis {e;} for Y, Q is represented by a symmetric matrix Q. 
Let O be an orthogonal matrix which diagonalises Q; that is, Q = O D Of, 
with D diagonal. 

We can further change basis to an orthogonal basis whose elements are 
however no longer normalised, in such a way that the resulting matrix D’ 
is still diagonal with all its entries either +1 or 0. Let (n4,n—,no) denote, 
respectively, the number of positive, negative and zero diagonal entries of D’. 
There is a result, known as Sylvester’s Law of Inertia, which says that the 
numbers (n4, n—, no) are an invariant of the quadratic form, so that they can 
be computed from the matrix of the quadratic form relative to any basis. 
A quadratic form is said to be non-degenerate if nọ = 0. It is said to be 
positive-definite if n. = no = 0, and negative-definite if n} = no = 0. 
Clearly a quadratic form is an inner product when it is positive-definite. A 
non-degenerate quadratic form, which is not necessarily positive- or negative- 
definite, defines a generalised inner product on Y. There are two integers 
which characterise a non-degenerate quadratic form: the dimension N of 
the vector space, and the signature n, — n_. Notice that if the signa- 
ture is bounded above by the dimension: the bound being saturated when 
the quadratic form is positive-definite. There are plenty of interesting non- 
degenerate quadratic forms which are not positive-definite. For example, 
Minkowski spacetime in the theory of special relativity possesses a quadratic 
form with dimension 4 and signature 2. 


1.4.5 Application: normal modes 


This section discusses the powerful method of normal modes to decouple 

interacting mechanical systems near equilibrium. It is perhaps not too exag- 

gerated to suggest that theoretical physicists spend a large part of their lives 
studying the problem of normal modes in one way or another. 

We start with a simple example. 

on ae Consider an idealised one-dimensional mechanical sys- 

p tem consisting of two point masses each of mass m con- 

nected by springs to each other and to two fixed ends. We 

will neglect gravity, friction and the mass of the springs. The springs obey 

Hooke’s law with spring constant k. We assume that the system is at equilib- 

rium when the springs are relaxed, and we want to study the system around 
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equilibrium; that is, we wish to study small displacements of the masses. We 
let x; for i = 1,2 denote the displacements from equilibrium for each of the 
two point masses, as shown below. 


Then the potential energy due to the springs is the sum of the potential 
energies of each of the springs: 
V = ikr] + $k (z — 21)? + pk z$ 

=k (xî + x3 — T12) . 


The kinetic energy is given by 
T= imiti F imis ; 
The equations of motion are then, for i = 1, 2, 


dôT ƏV 


Explicitly, we have the following coupled system of second order ordinary 
differential equations: 


ML = —2 kri T kzə 


Mə = —2 kro I kzı g 


Let us write this in matrix form. We introduce a column vector x! = (x1, £2). 
Then the above system of equations becomes 


X= —w? Kx , (1.59) 


De =i 
lara) 


and where we have introduced the notation 


where K is the matrix 


k 
w= ~ 
m 
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Notice that K is symmetric, hence it can be diagonalised by an orthogonal 
transformation. Let us find its eigenvalues and its eigenvectors. The charac- 
teristic polynomial of K is given by 


v0) = PT* 97h |= @-98-1- 0-00-93), 


from which it follows that it has as roots A = 1,3. The normalised eigenvec- 
tors corresponding to these eigenvalues are 


AQ): mC) 


respectively. We build the following matrix O out of the normalised eigen- 


vectors i 
1 1 
ao ei 


One can check that O is orthogonal: Ot = O71. One can also check that 
K=0D0*, 


where D is the diagonal matrix 


Inserting this expression into equation (1.59), we see that 
š = —w?ODO'x. 


In terms of the new variables 


the equation of motion (0.59) becomes 
y=—w’Dy. (1.60) 


Because the matrix D is diagonal, the equations of motion for the new vari- 
ables y; are now decoupled: 


i = —w?yy and jg = —3w yp . 
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One can now easily solve these equations, 


yi(t) = A; cos(w,t + 1) 
Yalt) = Ag cos(w2t + Y2) , 


where w1 = w, wz = V3w and A; and g; are constants to be determined from 
the initial conditions. The physical variables in the original problem are the 
displacements x; of each of the point masses. They can be found in terms of 
the new decoupled variables y; simply by inverting the change of variables 
1.60). Explicitly, 


A A 
a(t) = Fi cos(wıt + yi) + A cos(wat + p2) 
A A 


v2 v2 


Variables like the y; which decouple the equations of motion are called 
the normal modes of the mechanical system. Their virtue is that they 
reduce an interacting (i.e., coupled) mechanical system around equilibrium 
to a set of independent free oscillators. Each of these free oscillators are 
mathematical constructs: the normal modes do not generally correspond to 
the motion of any of the masses in the original system, but they nevertheless 
possess a certain “physicality” and it is fruitful to work with them as if they 
were physical. The original physical variables can then be understood as 
linear combinations of the normal modes as we saw above. The frequencies 
wi of the normal modes are known as the characteristic frequencies of the 
mechanical system. In particle physics, for example, the elementary particles 
are the normal modes and their masses are the characteristic frequencies. 

To illustrate the simplification in the dynamics which results from con- 
sidering the normal modes, in Figure [I] we have sketched the motion of the 
two masses in the problem and of the two normal modes, with time running 
horizontally to the right. 

Notice also that although the motion of each of the normal modes is 
periodic, the system as a whole is not. This is due to the fact that the 
characteristic frequencies are not rational multiples of each other. 


z(t) = cos(w it + y1) cos(Wet + p2) . 


Let us see this. Suppose that we have to oscillators with frequencies w; and w2. That 
means that the oscillators are periodic with periods T} = 27/w 1 and To = 27/w2. The 
combined system will be periodic provided that NıTı = N2T>2 for some integers N;. But 
this means that 

wy M 

w No’ 


which is a rational number. In the problem treated above, the ratio 


AV MAW AAA 
WAAL AAVV 


(a) Point masses (b) Normal modes 


Figure 1.1: Dynamics of point masses and normal modes. 


is irrational. Therefore the motion is aperiodic. 


If we were to plot the trajectory of the system in the plane, with the 
trajectory of one of the point masses along the x-axis and the trajectory of 
the other point mass along the y-axis, we see that the orbit never repeats, 
and that we end up filling up the available configuration space. In Figure 
we have plotted the cumulative trajectory of the system after letting it 
run for T units of time, for different values of T. As you can see, as T grows 
the system has visited more and more points in the available configuration 
space. Asymptotically, as T — oo, the system will have visited the whole 
available space. 


1.4.6 Application: near equilibrium dynamics 


In this section we will consider a more general mechanical system near equi- 
librium. Throughout the section V will be a real finite-dimensional vector 
space with an inner product. 

Consider a mechanical system whose configuration space is V. For ex- 
ample, it could be a system of n point particles in d dimensions, and then 
VY would be an (nd)-dimensional vector space. In the previous section we 
discussed the case of a one-dimensional system consisting of two point par- 
ticles, so that VY was two-dimensional. In the Problems we looked at systems 
with three-dimensional V. In this section we are letting V be arbitrary but 
finite-dimensional. 

The potential energy is given by a function V : Y — R. The configurations 
of mechanical equilibrium are those for which the gradient of the potential 
vanishes. Hence let us consider one such equilibrium configuration qọ € V: 


VV, = 0. 


q 
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(e) T = 100 (£) T = 300 
Figure 1.2: Trajectory of the mechanical system at different times. 


Because the potential energy is only defined up to an additive constant, we 
are free to choose it such that V(q)) = 0. We can therefore expand the 
potential function V about qo and the first contribution will be quadratic: 


V(q) = V (qo) + (VV |q, 4 — Go) + 3(4 — Go, H(4 — G0)) + °° 
= 5 (q -= qo, H(q — q0)) ; 
where H : Y — V is asymmetric linear transformation known as the Hessian 
of V at qo. Explicitly, if we choose an orthonormal basis {e;} for V, then 


let q = >>, qie; define some coordinates q; for the configuration space. Then 
relative to this basis the Hessian of V has matrix elements 


8? V 
H;; = e;, H €; = i 
J ( ( j)) ôq:ðq; S 
which shows manifestly that it is symmetric: Hj; = Hj; Let us define 


£ = q — q to be the displacements about equilibrium. These will be our 
dynamical variables. The potential energy in the quadratic approximation is 
given by 

V = (æ, H(æ)} . 
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We will make the assumption that the kinetic energy is quadratic in the 
velocities x: 


T = 3(&, M()) , 


2 
where the mass matrix M is assumed to be symmetric and positive-definite; 
that is, all its eigenvalues are positive. 

We will now analyse the dynamics of small displacements from equilib- 
rium following the following prescription: 


1. we will standardise the kinetic energy by diagonalising and normalising 
the mass matrix; and 


2. we will then diagonalise the potential energy and solve for the normal 
modes and characteristic frequencies of this system. 


Both steps make use of the spectral theorem for symmetric transforma- 
tions. To do the first step notice that relative to an orthonormal basis {e;} 
for V, x = J, xie; and we can form a column vector 


Tı 
T2 


x 
I 


TN 


out of the components of a. Relative to this basis, the mass matrix M and 
the Hessian H have matrices M and H, respectively. By assumption both are 
symmetric, and M is in addition positive-definite. The kinetic and potential 
energies become 


T = 3x'Mx and V = ix Hx. 


Because M is symmetric, there is an orthogonal matrix O; such that 
M’ = O{ MO, is diagonal with positive entries. Let Dı be the diagonal 
matrix whose entries are the (positive) square roots of the diagonal entries 
of M’. In other words, M’ = D7. We can therefore write 


M = 0, D? Of = (0,D,) (0, D,)* , 


where we have used that Dj = D; since it is diagonal. Introduce then the 
following variables 
Y= (O: D,)*x = D, Ot x : 


We can invert this change of variables as follows: 


x=0,Djy'y, 
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where we have used that O; is orthogonal, so that Of = O;'. This change of 
variables accomplishes the first step outlined above, since in terms of y, the 
kinetic energy becomes simply 


T= hy'y = INP 


Similarly, the potential energy has become 
V= ży Ky, 
where the matrix K is defined by 
K = D O HO, D7}, 


which is clearly symmetric since H and D; are. Therefore we can find a 
second orthogonal matrix Os such that Ob K Oy» is diagonal; call this matrix 
D. Let us define a new set of variables 


z=0;y, 
relative to which the kinetic energy remains simple 
T = 5ll02zļ? = sili’ , 


since orthogonal matrices preserve norms, and the potential energy diago- 
nalises 


V = iz Dz. 


Because D is diagonal, the equations of motion of the z are decoupled: 
z = -Dz , 


whence the z are the normal modes of the system. Let D have entries 


Then the equations of motion for the normal modes are 
We can distinguish three types of solutions: 
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1. (A; > 0) The solution is oscillatory with characteristic frequency w; = 
Vi: 


z(t) = A;cos(w;t + y;) . 
2. (A; = 0) The solution is linear 
z(t) = Qa; + bt A 


Such a normal mode is said to be a zero mode, since it has zero 
characteristic frequency. 


3. (A; < 0) The solution is exponential 
z(t) = A; exp ( Ailt) + Biexp (- Pilt) : 


If all eigenvalues A; are positive the equilibrium point is said to be stable, 
if they are all non-negative then it is semi-stable, whereas if there is a nega- 
tive eigenvalue, then the equilibrium is unstable. The signs of the eigenvalues 
of the matrix D agree with the sign of the eigenvalues of the Hessian matrix 
of the potential at the equilibrium point. The different types of equilibria are 
illustrated in Figure [[.3] which shows the behaviour of the potential function 
around an equilibrium point in the simple case of a two-dimensional configu- 
ration space Y. The existence of zero modes is symptomatic of flat directions 
in the potential along which the system can evolve without spending any en- 
ergy. This usually signals the existence of some continuous symmetry in the 
system. In the Figure we see that the semi-stable equilibrium point indeed 
has a flat direction along which the potential is constant. In other words, 
translation along the flat direction is a symmetry of the potential function. 
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(a) stable (b) semi-stable (c) unstable 


Figure 1.3: Different types of equilibrium points. 
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Chapter 2 


Complex Analysis 


In this part of the course we will study some basic complex analysis. This is 
an extremely useful and beautiful part of mathematics and forms the basis 
of many techniques employed in many branches of mathematics and physics. 
We will extend the notions of derivatives and integrals, familiar from calculus, 
to the case of complex functions of a complex variable. In so doing we will 
come across analytic functions, which form the centerpiece of this part of the 
course. In fact, to a large extent complex analysis is the study of analytic 
functions. After a brief review of complex numbers as points in the complex 
plane, we will first discuss analyticity and give plenty of examples of analytic 
functions. We will then discuss complex integration, culminating with the 
generalised Cauchy Integral Formula, and some of its applications. We then 
go on to discuss the power series representations of analytic functions and 
the residue calculus, which will allow us to compute many real integrals and 
infinite sums very easily via complex integration. 


2.1 Analytic functions 


In this section we will study complex functions of a complex variable. We 
will see that differentiability of such a function is a non-trivial property, 
giving rise to the concept of an analytic function. We will then study many 
examples of analytic functions. In fact, the construction of analytic functions 
will form a basic leitmotif for this part of the course. 


2.1.1 The complex plane 


We already discussed complex numbers briefly in Section The emphasis 
in that section was on the algebraic properties of complex numbers, and 
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although these properties are of course important here as well and will be 
used all the time, we are now also interested in more geometric properties of 
the complex numbers. 

The set C of complex numbers is naturally identified with the plane R?. 
This is often called the Argand plane. 

Given a complex number z = «+7y, its real and imag- 
inary parts define an element (x,y) of R?, as shown in 
the figure. In fact this identification is one of real vec- 
tor spaces, in the sense that adding complex numbers 
and multiplying them with real scalars mimic the simi- 
lar operations one can do in R?. Indeed, if a € R is real, 
then to az = (ax) +i(ay) there corresponds the pair 
(az,ay) =a (x,y). Similarly, if z1 = xı + iyı and zg = 22 + i Y2 are com- 
plex numbers, then 21 + z2 = (41 + %2) + i (y1 + y2), whose associated pair 
is (£1 + %o,y1 + Y2) = (z1, y1) + (£2, y2). In fact, the identification is even 
one of euclidean spaces. Given a complex number z = x + iy, its modulus 
|z|, defined by |z|? = z2*, is given by \/x? + y? which is precisely the norm 
(x, y)|| of the pair (x,y). Similarly, if z1 = zı + iyı and z = ro + iy, 
then Re(ziz2) = ©1%2 + yY2 which is the dot product of the pairs (xj, y1) 
and (2, Y2). In particular, it follows from these remarks and the triangle 
inequality for the norm in R°, that complex numbers obey a version of the 


triangle inequality: 
lza + 20] < [al + lezl. (2.1) 


Polar form and the argument function 


Points in the plane can also be represented using polar coordinates, and 
this representation in turn translates into a representation of the complex 
numbers. 

„—pei0 Let (x,y) be a point in the plane. If we define r = 


7 y x? +y? and 0 by 6 = arctan(y/x), then we can write 
Va (x,y) = (r cos6,r sin@) = r(cos6,sin@). The complex 
number z = x + iy can then be written as z = r(cos@ + 
i sin@). The real number r, as we have seen, is the modulus 
|z| of z, and the complex number cos@ + i sin@ has unit 
modulus. Comparing the Taylor series for the cosine and 
sine functions and the exponential functions we notice that cos 0+i sin 0 = e°. 
The angle @ is called the argument of z and is written arg(z). Therefore we 
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have the following polar form for a complex number z: 
z = |z| é s6 (2.2) 


Being an angle, the argument of a complex number is only defined up to the 
addition of integer multiples of 27. In other words, it is a multiple-valued 
function. This ambiguity can be resolved by defining the principal value 
Arg of the arg function to take values in the interval (—7, 7]; that is, for any 
complex number z, one has 


=r < Arg(z) <7. (2.3) 


Notice, however, that Arg is not a continuous function: it has a discontinuity 
along the negative real axis. Approaching a point on the negative real axis 
from the upper half-plane, the principal value of its argument approaches 7, 
whereas if we approach it from the lower half-plane, the principal value of 
its argument approaches —7. Notice finally that whereas the modulus is a 
multiplicative function: |zw| = |z||w|, the argument is additive: arg(z, z2) = 
arg(z1) + arg(z2), provided that we understand the equation to hold up to 
integer multiples of 27. Also notice that whereas the modulus is invariant 
under conjugation |z*| = |z|, the argument changes sign arg(z*) = — arg(z), 
again up to integer multiples of 27. 


Some important subsets of the complex plane 


We end this section with a brief discussion of some very important subsets 
of the complex plane. Let zọ be any complex number, and consider all those 
complex numbers z which are a distance at most € away from zp. These 
points form a disk of radius € centred at zo. More precisely, let us define the 
open £-disk around 2 to be the subset D.(zo) of the complex plane defined 
by 

D(z) = {z € C | |z— z0| < e} . (2.4) 


Similarly one defines the closed ¢-disk around zo to be the subset 


Dz(z) = {z € € | |z — z| < e} , (2.5) 


which consists of the open ¢-disk and the circle |z — zo| = € which forms its 
boundary. More generally a subset U C C of the complex plane is said to be 
open if given any z € U, there exists some positive real number ¢ > 0 (which 
can depend on z) such that the open ¢-disk around z also belongs to U. A set 
C is said to be closed if its complement C° = {z € C | z ¢ C}—that is, all 
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those points not in C—is open. One should keep in mind that generic subsets 
of the complex plane are neither closed nor open. By a neighbourhood of a 
point zo in the complex plane, we will mean any open set containing z. For 
example, any open ¢-disk around 2g is a neighbourhood of Zo. 


Let us see that the open and closed ¢-disks are indeed open and closed, respectively. Let 
z € De(zo). This means that |z — zo| = 6 < e. Consider the disk D;_5(z). We claim that 
this disk is contained in De(zo). Indeed, if |w — z| < £ — 6 then, 


jw — zo| = |(w — z) + (z — z0)| (adding and subtracting z) 
< |w — z| + |z — zo| (by the triangle inequality (2.1)) 
<e-—6+6 


Sc. 


Therefore the disk De (zo) is indeed open. Consider now the subset Ds (zo). Its complement 
is the subset of points z in the complex plane such that |z — zo| > e. We will show that it 
is an open set. Let z be such that |z — zo| = n > e. Then consider the open disk D,—-(z), 
and let w be a point in it. Then 


|z — zo| = |(z — w) + (w — z0)| (adding and subtracting w) 
< |z — w| + |w — zol . (by the triangle inequality (2.1)) 


We can rewrite this as 


|w — zo| = |z — zol — |z = w| 


>n-—(n-e) (since |z — w| = |w — z| < n = £) 


= Es 


Therefore the complement of D-(z0) is open, whence Ds (zo) is closed. 


We should remark that the closed disk Dz(z0) is not open, since any open disk around a 
point z at the boundary of De(zo)—that is, for which |z — zo| = e—contains points which 
are not included in De(zo). 


Notice that it follows from this definition that every open set is made out of the union of 
(a possibly uncountable number of) open disks. 


2.1.2 Complex-valued functions 


In this section we will discuss complex-valued functions. 

We start with a rather trivial case of a complex-valued function. Suppose 
that f is a complex-valued function of a real variable. That means that if x is 
a real number, f(x) is a complex number, which can be decomposed into its 
real and imaginary parts: f(x) = u(x)+iv(«), where u and v are real-valued 
functions of a real variable; that is, the objects you are familiar with from 
calculus. We say that f is continuous at xo if u and v are continuous at zo. 


Let us recall the definition of continuity. Let f be a real-valued function of a real variable. 
We say that f is continuous at xo, if for every € > 0, there is a 6 > 0 such that |f(x) — 
f(xo)| < £ whenever |x — xo| < 6. A function is said to be continuous if it is continuous 
at all points where it is defined. 
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Now consider a complex-valued function f of a complex variable z. We 
say that f is continuous at zo if given any € > 0, there exists a ô > 0 such 
that | f(z) — f(zo)| < € whenever |z — zo| < 6. Heuristically, another way of 
saying that f is continuous at zo is that f(z) tends to f(zo) as z approaches 
zo. This is equivalent to the continuity of the real and imaginary parts of f 
thought of as real-valued functions on the complex plane. Explicitly, if we 
write f = u+iv and z = z +iy, u(z,y) and v(x, y) are real-valued functions 
on the complex plane. Then the continuity of f at zo = £o +2 yo is equivalent 
to the continuity of u and v at the point (£o, yo). 


“Graphing” complex-valued functions 


Complex-valued functions of a complex variable are harder to visualise than 
their real analogues. To visualise a real function f : R — R, one simply 
graphs the function: its graph being the curve y = f(x) in the (x, y)-plane. 
A complex-valued function of a complex variable f : C — C maps complex 
numbers to complex numbers, or equivalently points in the (z, y)-plane to 
points in the (u,v) plane. Hence its graph defines a surface u = u(x, y) and 
v = v(x, y) in the four-dimensional space with coordinates (x, y, u,v), which 
is not so easy to visualise. Instead one resorts to investigating what the 
function does to regions in the complex plane. Traditionally one considers 
two planes: the z-plane whose points have coordinates (x,y) corresponding 
to the real and imaginary parts of z = xz +i y, and the w-plane whose points 
have coordinates (u,v) corresponding to w = u+iv. Any complex-valued 
function f of the complex variable z maps points in the z-plane to points 
in the w-plane via w = f(z). A lot can be learned from a complex function 
by analysing the image in the w-plane of certain sets in the z-plane. We 
will have plenty of opportunities to use this throughout the course of these 
lectures. 


With the picture of the z- and w-planes in mind, one can restate the continuity of a 
function very simply in terms of open sets. In fact, this was the historical reason why the 
notion of open sets was introduced in mathematics. As we saw, a complex-valued function 
f of a complex variable z defines a mapping from the complex z-plane to the complex 
w-plane. The function f is continuous at zo if for every neighbourhood U of wo = f(zo) 
in the w-plane, the set 


fU) ={z| f(z) € U} 


is open in the z-plane. Checking that both definitions of continuity agree is left as an 
exercise. 


2.1.3 Differentiability and analyticity 


Let us now discuss differentiation of complex-valued functions. Again, if f = 
u + iv is a complex-valued function of a real variable x, then the derivative 


TT 


of f at the point xo is defined by 
f'(xo) = u'(xo) + i v'(x0) , 


where wu’ and v’ are the derivatives of u and v respectively. In other words, 
we extend the operation of differentiation complex-linearly. There is nothing 
novel here. 


Differentiability and the Cauchy—Riemann equations 


The situation is drastically different when we consider a complex-valued func- 
tion f = u+iv of a complex variable z = x+i y. Asis calculus, let us attempt 
to define its derivative by 


(2.6) 


The first thing that we notice is that Az, being a complex number, can 
approach zero in more than one way. If we write Az = Ax + i Ay, then we 
can approach zero along the real axis Ay = 0 or along the imaginary axis 
Ax = 0, or indeed along any direction. For the derivative to exist, the answer 
should not depend on how Az tends to 0. Let us see what this entails. Let 
us write f=u+iv and zo = zo + i yo, so that 


f (20) = u(Zo, yo) + i v(ZXo, yo) 
f(zo + Az) = u(ao + Az, yo + Ay) + i v(£o + Az, yo + Ay) . 


Then A EAN j 
; U\ XO, Yo) + t AV To, Yo 
F(z) = lim - s 
a Ax + iAy 
where 


Au(Xo, Yo) = u(zo + Ax, yo + Ay) — u(Zo, Yo) 

Av(zo, Yo) = v(xo + Az, yo + Ay) — v(Xo, yo) - 
Let us first take the limit Az — 0 by first taking Ay — 0 and then Ax — 0; 
in other words, we let Az — 0 along the real axis. Then 


Au(2o, yo) + i Av(zo, yo) 


F B ji 
F (20) EE AGO Az + iAy 
zia Au(zo, Yo) + i Av(xo, yo) 
Az—0 Ar 
Ou Ov 
= —— V ea 
Ox (xo,4o) Ox (r0,4o) 
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Now let us take the limit Az — 0 by first taking Ax — 0 and then Ay — 0; 
in other words, we let Az — 0 along the imaginary axis. Then 


Au(2o, Yo) + i Av(zo, Yo) 


f'(zo) = lim lim 


Ay—0 Axr—0 Ax + iAy 
A ; A 
ies u(zo, Yo) + i Av(zo, yo) 
Ay>0 i Ay 
= „ðu Ov 
Oy (xo,yo) Oy (xo,yo) 


These two expressions for f’(zo) agree if and only if the following equations 
are satisfied at (Xo, yo): 


(2.7) 


These equations are called the Cauchy—Riemann equations. 
We say that the function f is differentiable at z if f’(zo) is well-defined 
at zo. For a differentiable function f we have just seen that 


We have just shown that a necessary condition for f to be differentiable at 
zo is that its real and imaginary parts obey the Cauchy—Riemann equations 
at (xo, Yo). Conversely, it can be shown that this condition is also sufficient 
provided that the the partial derivatives of u and v are continuous. 

We say that the function f is analytic in a neighbourhood U of zp if it is 
differentiable everywhere in U. We say that a function is entire if it is analytic 
in the whole complex plane. Often the terms regular and holomorphic are 
used as synonyms for analytic. 

For example, the function f(z) = z is entire. We can check this either by 
verifying the Cauchy—Riemann equations or else simply by noticing that 


; : zo + Az) — f(z 
Pen) = tm, feat Ba) = Flan) 

zo + Az — 2 
Reet Az 


I 
3 
| 
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whence it is well-defined for all zp. 
On the other hand, the function f(z) = 2* is not differentiable anywhere: 


f(z + Az) — f (z0) 


f(z) = Jim 


I 
3 


whence if we let Az tend to zero along real values, we would find that f’(zo) = 
1, whereas if we would let Az tend to zero along imaginary values we would 
find that f’(zo) = —1. We could have reached the same conclusion via 
the Cauchy—Riemann equations with u(z,y) = x and v(z,y) = —y, which 
violates the first of the Cauchy—Riemann equations. 

It is important to realise that analyticity, unlike differentiability, is not 
a property of a function at a point, but on an open set of points. The 
reason for this is to able to eliminate from the class of interesting functions, 
functions which may be differentiable at a point but nowhere else. Whereas 
this is a rarity in calculug4, it is a very common occurrence for complex- 
valued functions of a complex variables. For example, consider the function 
f(z) = |z/?. This function has u(x, y) = z? + y? and v(x,y) = 0. Therefore 
the Cauchy—Riemann equations are only satisfied at the origin in the complex 
plane: 


Ou Ov Ov Ou 

— =27 = — =0 = 

Ox Oy Ox Oy 
Relation with harmonic functions 


Analytic functions are intimately related to harmonic functions. We say that 
a real-valued function h(x, y) on the plane is harmonic if it obeys Laplace’s 
equation: 


(2.8) 


In fact, as we now show, the real and imaginary parts of an analytic function 
are harmonic. Let f = u + iv be analytic in some open set of the complex 


‘Try to come up with a real-valued function of a real variable which is differentiable 
only at the origin, for example. 
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plane. Then, 
fu Ou 0 du, 0 du 
ðr? Əy?  ƏxðƏðxr ` Əy Oy 


a x z E = a (using Cauchy-Riemann) 
= Orv A 82v 

~ ðrðy = Oy Ox 

=0. 


A similar calculation shows that v is also harmonic. This result is important 
in applications because it shows that one can obtain solutions of a second 
order partial differential equation by solving a system of first order partial 
differential equations. It is particularly important in this case because we 
will be able to obtain solutions of the Cauchy—Riemann equations without 
really solving these equations. 

Given a harmonic function u we say that another harmonic function v is 
its harmonic conjugate if the complex-valued function f = u+7v is analytic. 
For example, consider the function u(x, y) = ry—x+y. It is clearly harmonic 


since ə ə 
u u 
ea de rki 
an Oy gii 
whence 
Pu Pu r 
ðr? Əy ` 


By a harmonic conjugate we mean any function v(x, y) which together with 
u(x, y) satisfies the Cauchy—Riemann equations: 


AEE a Fi and Dye San Oe 


Ox Oy Oy Ox A 
We can integrate the first of the above equations, to obtain 
v(x, y) = —3a? Sta ply) ’ 


for w an arbitrary function of y which is to be determined from the second 
of the Cauchy—Riemann equations. Doing this one finds 


vy) =y-1, 


which is solved by w(y) = sy? — y + c, where c is any constant. Therefore, 
the function f = u + iv becomes 


f(a,y) =ay-—a2+y+i(—$a?4+ dy -—a-—yto). 


We can try to write this down in terms of z and z* by making the substitutions 
v= į(z + 2*) and y = —i ¿(z — 2*). After a little bit of algebra, we find 
fj=-i2-(4+i)ztic. 


Notice that all the z* dependence has dropped out. We will see below that 
this is a sign of analyticity. 


2.1.4 Polynomials and rational functions 


We now start to build up some examples of analytic functions. We have 
already seen that the function f(z) = z is entire. In this section we will 
generalise this to show that so is any polynomial P(z). We will also see that 
ratios of polynomials are also analytic everywhere but on a finite set of points 
in the complex plane where the denominator vanishes. 

There are many ways to do this, but one illuminating way is to show 
that complex linear combinations of analytic functions are analytic and that 
products of analytic functions are analytic functions. Let f(z) be an analytic 
function on some open subset U C C, and let a be a complex number. Then 
it is easy to see that the function a f(z) is also analytic on U. Indeed, from 
the definition of the derivative, we see that 


(a f) (2%) = a f"(20) , (2.9) 


which exists whenever f’(zo) exists. 

Let f(z) and g(z) be analytic functions on the same open subset U C C. 
Then the functions f(z) + g(z) and f(z)g(z) are also analytic. Again from 
the definition (2.6) of the derivative, 


(f + 9)' (20) = f' (zo) + g' (z0) (2.10) 
(f 9)’ (20) = f'(20) 9(20) + f(z) 9’(20) , (2.11) 


which exist whenever f’(zo) and g’(zo) exist. 


The only tricky bit in the above result is that we have to make sure that f and g are 
analytic in the same open set U. Normally it happens that f and g are analytic in 
different open sets, say, U} and U2 respectively. Then the sum f(z) + g(z) and product 
f(z) g() are analytic in the intersection U = U1 N U2, which is also open. This is easy to 
see. Let us assume that U is not empty, otherwise the statement is trivially satisfied. Let 
z € U. This means that z € Uı and z € U2. Because each U; is open there are positive 
real numbers £; such that D-,(z) lies inside U;. Let € = min(€1,¢2) be the smallest of the 
ci. Then De(z) C De,(z) C U; for i = 1,2. Therefore Dz(z) C U, and U is open. 


It is important to realise that only finite intersections of open sets will again be open in 
general. Consider, for example, the open disks D4 /n (0) of radius 1/n about the origin, 
for n = 1,2,3,.... Their intersection consists of the points z with |z| < 1/n for all 
n = 1,2,3,.... Clearly, if z # 0 then there will be some positive integer n for which 
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|z| > 1/n. Therefore the only point in the intersection of all the D,/,,(0) is the origin 
itself. But this set is clearly not open, since it does not contain any open disk with nonzero 
radius. More generally, sets consisting of a finite number of points are never open; although 
they are closed. 


Therefore we see that (finite) sums and products of analytic functions 
are analytic with the same domain of analyticity. In particular, sums and 
products of entire functions are again entire. As a result, from the fact 
that the function f(z) = z is entire, we see that any polynomial P(z) = 
ye an 2” of finite degree N is also an entire function. Indeed, its derivative 
is given by 


N 
P'(z) = Y nan ae 
n=1 


as follows from the above formulae for the derivatives of sums and products. 

We will see later on in the course that to some extent we will be able 
to describe all analytic functions (at least locally) in terms of polynomials, 
provided that we allow the polynomials to have arbitrarily high degree; in 
other words, in terms of power series. 

There are two more constructions which start from analytic functions and 
yield an analytic function: quotients and composition. Let f(z) and g(z) be 
analytic functions on some open subset U C C. Then the quotient f(z)/g(z) 
is continuous away from the zeros of g(z), which can be shown (see below) to 
be an open set. If g(zo) 4 0, then from the definition of the derivative : 
it follows that 


(2) eo) a Sled fe) 
g 


To see that the subset of points z for which g(z) Æ 0 is open, we need only realise that 
this set is the inverse image g~1({0}°) under g of the complement of 0. The result then 
follows because the complement of 0 is open and g is continuous, so that g~1(open) is 
open. 


By a rational function we mean the ratio of two polynomials. Let P(z) 
and Q(z) be two polynomials. Then the rational function 


is analytic away from the zeros of Q(z). 


result is known as the Fundamental Theorem of Algebra and although it is of course intu- 
itive and in agreement with our daily experience with polynomials, its proof is surprisingly 
difficult. There are three standard proofs: one is purely algebraic, but it is long and ar- 
duous, one uses algebraic topology and the other uses complex analysis. We will in fact 
see this third proof later on in Section [2.2.6 


© We have been tacitly assuming that every (non-constant) polynomial Q(z) has zeros. This 
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Finally let g(z) be analytic in an open subset U C C and let f(z) be 
analytic in some open subset containing g(U), the image of U under g. Then 
the composition f o g defined by (f o g)(z) = f(g(z)) is also analytic in U. 
In fact, its derivative can be computed using the chain rule, 


(fo g) (20) = f'(g(20)) 9 (Zo) . (2.12) 


You may wonder whether g(U) is an open set, for U open and g analytic. This is indeed 
true: it is called the open mapping property of analytic functions. We may see this later 
on in the course. 


It is clear that if f and g are rational functions so will be its composition 
f og, so one only ever constructs new functions this way when one of the 
functions being composed is not rational. We will see plenty of examples of 
this as the lectures progress. 


Another look at the Cauchy—Riemann equations 


Finally let us mention an a different way to understand the Cauchy—Riemann 
equations, at least for the case of rational functions. Notice that the above 
polynomials and rational functions share the property that they do not de- 
pend on z* but only on z. Suppose that one is given a rational function 
where the dependence on x and y has been made explicit. For example, 


x—l—iy 


al es hey 


In order to see whether f is analytic one would have to apply the Cauchy— 
Riemann equations, which can get rather messy when the rational function 
is complicated. It turns out that it is not necessary to do this. Instead one 
can try to re-express the function in terms of z and z* by the substitutions 
ADRA DER 
and = . 
2 TE 

Then, the rational function f(z, y) is analytic if and only if the z* dependence 
cancels. In the above example, one can see that this is indeed the case. 
Indeed, rewriting f(x,y) in terms of z and z* we see that 


z*— 1 = 1 
[ee eee a 


T E 


f(z, y) = 


whence the z* dependence has dropped out. We therefore expect that the 
Cauchy—Riemann equations will be satisfied. Indeed, one has that 
x-i —y 


u(z,y) = (e@—1?+y2 and v(x, y) = (@_-)?+y¥ ; 
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and after some algebra, 


Ou - (x-1) +y? _ Ov 
ðr ((a@—1)? + y2)” Oy 
Ou —2(4-—1)y Ov 


ðy ((e@-1)P? +y)? Oe 
The reason this works is the following. Let us think formally of z and z* as 
independent variables for the plane, like x and y. Then we have that 


of ðf _ af, af 
Oy ` 


Oz* O(x—iy) Ox 


Let us break up f into its real and imaginary parts: f(x,y) = ulz, y) + 
iv(x, y). Then, 
Of ðu v „ðu Ov 


Ae On oe eee 


_ (Qu dv nè ðv ðu 
(r y) dx yj ` 


Therefore we see that the Cauchy—Riemann equations are equivalent to the 
condition 


Of 
Oz* ue 


(2.13) 


2.1.5 The complex exponential and related functions 


There are many other analytic functions besides the rational functions. Some 
of them are related to the exponential function. 

Let z = x+iy be a complex number and define the complex exponential 
exp(z) (also written e7) to be the function 


exp(z) = exp(a + iy) = e” (cosy + i siny) . 


We will first check that this function is entire. Decomposing it into real and 
imaginary parts, we see that 


u(x, y) = e” cosy and v(z,y) = e siny. 
It is easy to check that the Cauchy-Riemann equations are satisfied 
everywhere on the complex plane: 
Ou Ov Ov Ou 


— =e" cosy = — and — =e" siny = —-—. 


Ox Oy Ox Oy 


Therefore the function is entire and its derivative is given by 


; ðu . Ov 
exp (z) = an tin. 
=e” cosy + ie” siny 

= exp(z) . 


The exponential function obeys the following addition property 
exp(z1 + 22) = exp(z1) exp(22) , (2.14) 


which has as a consequence the celebrated De Moivre’s Formula: 
(cos@ + i sin 0)” = cos(n@) + i sin(n@) , 


obtained simply by noticing that exp(i nf) = exp(i 6)”. 

The exponential is also a periodic function, with period 277. In fact from 
the periodicity of trigonometric functions, we see that exp(277) = 1 and 
hence, using the addition property (2.14), we find 


exp(z + 27 i) = exp(z) . (2.15) 


This means that the exponential is not one-to-one, in sharp contrast with the 
real exponential function. It follows from the definition of the exponential 
function that 


exp(21) = exp(z2) if and only if 2, = z.+2a7ik for some integer k. 


We can divide up the complex plane into horizontal strips of height 27 in 
such a way that in each strip the exponential function is one-to-one. To see 
this define the following subsets of the complex plane 


Sp = {x +iy EC | (Qk—-la<y< (2k +1)r}, 


for k = 0, +1, +2, ..., as shown in Figure 

Then it follows that if z1 and z2 belong to the same set Sx, then exp(z1) = 
exp(z2) implies that z; = z2. Each of the sets 8; is known as a fundamental 
region for the exponential function. The basic property satisfied by a funda- 
mental region of a periodic function is that if one knows the behaviour of the 
function on the fundamental region, one can use the periodicity to find out 
the behaviour of the function everywhere, and that it is the smallest region 
with that property. The periodicity of the complex exponential will have as 
a consequence that the complex logarithm will not be single-valued. 
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Figure 2.1: Fundamental regions of the complex exponential function. 


Complex trigonometric functions 


We can also define complex trigonometric functions starting from the complex 
exponential. Let z = x + iy be a complex number. Then we define the 
complex sine and cosine functions as 


sin(z) = ————_ and cos(z) = 


Being linear combinations of the entire functions exp(+iz), they themselves 
are entire. Their derivatives are 


sin'(z) =cos(z) and cos’(z) = —sin(z) . 


The complex trigonometric functions obey many of the same properties 
of the real sine and cosine functions, with which they agree when z is real. 
For example, 

cos(z)? + sin(z)? =1, 


and they are periodic with period 27. However, there is one important 
difference between the real and complex trigonometric functions: whereas 
the real sine and cosine functions are bounded, their complex counterparts 
are not. To see this let us break them up into real and imaginary parts: 


sin(x +7y) = singz coshy +i cos sinh y 


cos(x + iy) = cosx coshy — i sing sinh y . 
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We see that the appearance of the hyperbolic functions means that the com- 
plex sine and cosine functions are not bounded. 
Finally, let us define the complex hyperbolic functions. If z = x +7y, 
then let 
f =E e+e”? 
sinh(z) = a and cosh(z) = ——— 
In contrast with the real hyperbolic functions, they are not independent from 
the trigonometric functions. Indeed, we see that 


sinh(iz) = 7sin(z) and cosh(iz) = cos(z) . (2.16) 


Notice that one can also define other complex trigonometric functions: 
tan(z), cot(z), sec(z) and csc(z) in the usual way, as well as their hyperbolic 
counterparts. These functions obey many other properties, but we will not 
review them here. Instead we urge you to play with these functions until you 
are familiar with them. 


2.1.6 The complex logarithm 


This section introduces the logarithm of a complex number. We will see that 
in contrast with the real logarithm function which is only defined for posi- 
tive real numbers, the complex logarithm is defined for all nonzero complex 
numbers, but at a price: the function is not single-valued. This has to do 
with the periodicity of the complex exponential or, equivalently, with 
the multiple-valuedness of the argument arg(z). 

In this course we will use the notation ‘log’ for the natural logarithm, 
not for the logarithm base 10. Some people also use the notation ‘In’ for the 
natural logarithm, in order to distinguish it from the logarithm base 10; but 
we will not be forced to do this since we will only be concerned with the 
natural logarithm. 

By analogy with the real natural logarithm, we define the complex loga- 
rithm as an inverse to the complex exponential function. In other words, we 
say that a logarithm of a nonzero complex number z, is any complex number 
w such that exp(w) = z. In other words, we define the function log(z) by 


w=log(z) if exp(w)=z. (2.17) 
From the periodicity (2.15) of the exponential function it follows that if 
w = log(z) so is w + 277k for any integer k. Therefore we see that log(z) is 


a multiple-valued function. We met a multiple-valued function before: the 
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argument function arg(z). Clearly if 0 = arg(z) then so is 0 + 27k for any 
integer k. This is no accident: the imaginary part of the log(z) function is 
arg(z). To see this, let us write z in polar form (2.2) z = |z| exp(i arg(z)) 
and w = log(z) = u + iv. By the above definition and using the addition 


property (2.14), we have 

exp(u+iv) = e” é” = |z| é BO | 
whence comparing polar forms we see that 

e” = |z| and ef? = ef 82) 


Since u is a real number and |z| is a positive real number, we can solve the 

first equation for u uniquely using the real logarithmic function, which in 

order to distinguish it from the complex function log(z) we will write as Log: 
u = Log |z| . 


Similarly, we see that v = arg(z) solves the second equation. So does v+ 27k 
for any integer k, but this is already taken into account by the multiple- 
valuedness of the arg(z) function. Therefore we can write 


log(z) = Log |z| + i arg(z) , (2.18) 


where we see that it is a multiple-valued function as a result of the fact that 
so is arg(z). In terms of the principal value Arg(z) of the argument function, 
we can also write the log(z) as follows: 


log(z) = Log |z| + i Arg(z) + 2r ik fork = | El E2; (2.19) 


which makes the multiple-valuedness manifest. 
For example, whereas the real logarithm of 1 is simply 0, the complex 
logarithm is given by 


log(1) = Log |1| +7 arg(1)=0+i2rk for any integer k. 


As promised, we can now take the logarithm of negative real numbers. For 
example, 


log(—1) = Log | — 1| + ¿i arg(—1)=0+im+i2rk for any integer k. 


The complex logarithm obeys many of the algebraic identities that we 
expect from the real logarithm, only that we have to take into account its 
multiple-valuedness properly. Therefore an identity like 


log(z1 z2) = log(z1) + log(z2) , (2.20) 
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for nonzero complex numbers zı and 2s, is still valid in the sense that having 
chosen a value (out of the infinitely many possible values) for log(z1) and for 
log(z2), then there is a value of log(z, z2) for which the above equation holds. 
Or said in a different way, the identity holds up to integer multiples of 277 
or, as it is often said, modulo 2r i: 


log(z1 22) — log(z1) — log(z2) = 27ik for some integer k. 


Similarly we have 


log(z1/z2) = log(z1) — log(z2) , (2.21) 
in the same sense as before, for any two nonzero complex numbers z1 and 22. 


Choosing a branch for the logarithm 


We now turn to the discussion of the analyticity properties of the complex 
logarithm function. In order to discuss the analyticity of a function, we need 
to investigate its differentiability, and for this we need to be able to take 
its derivative as in equation (2.6). Suppose we were to try to compute the 
derivative of the function log(z) at some point zọ. Writing the derivative as 
the limit of a quotient, 


= ten log(zo + Az) — log(zo) 
Az—0 Az 


log’ (zo) , 
we encounter an immediate obstacle: since the function log(z) is multiple- 
valued we have to make sure that the two log functions in the numerator tend 
to the same value in the limit, otherwise the limit will not exist. In other 
words, we have to choose one of the infinitely many values for the log function 
in a consistent way. This way of restricting the values of a multiple-valued 
function to make it single-valued in some region (in the above example in 
some neighbourhood of zo) is called choosing a branch of the function. For 
example, we define the principal branch Log of the logarithmic function to 
be 
Log(z) = Log |z| +i Arg(z) , 


where Arg(z) is the principal value of arg(z). Af first sight it might seem 
that this notation is inconsistent, since we are using Log both for the real 
logarithm and the principal branch of the complex logarithm. So let us make 
sure that this is not the case. If z is a positive real number, then z = |z| 
and Arg(z) = 0, whence Log(z) = Log|z|. Hence at least the notation is 
consistent. The function Log(z) is single-valued, but at a price: it is no 
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longer continuous in the whole complex plane, since Arg(z) is not continuous 
in the whole complex plane. As explained in Section the principal 
branch Arg(z) of the argument function is discontinuous along the negative 
real axis. Indeed, let z+ = —x-+7¢ where x and £ are positive numbers. In the 
limit € — 0, z, and z_ tend to the same point on the negative real axis from 
the upper and lower half-planes respectively. Hence whereas lim,_.9 z = —2, 
the principal value of the logarithm obeys 


lim Log(z+) = Log(x) tiz , 


so that it is not a continuous function anywhere on the negative real axis, or 
at the origin, where the function itself is not well-defined. The non-positive 
real axis is known as a branch cut for this function and the origin is known 
as a branch point. 

Let D denote all the points in the complex plane except 
for those which are real and non-positive; in other words, 
D is the complement of the non-positive real axis. It is easy 
to check that D is an open subset of the complex plane and 
by construction, Log(z) is single-valued and continuous for 
all points in D. We will now check that it is analytic there 
as well. For this we need to compute its derivative. So let 
zo E D be any point in D and consider wo = Log(zo). Letting Az = z — 20, 
we can write the derivative of w = Log(z) at zo in the following form 


D 


a 


w — Wo 


Log'(zo) = lim 
z=z0 Z — 2 


where to reach the second line we used the fact that w = wo implies z = zo 
(single-valuedness of the exponential function), and to reach the third line 
we used the continuity of Log(z) in D to deduce that w — wo as z > Zo. 
Now using that z = e™ we see that what we have here is the reciprocal of 
the derivative of the exponential function, whence 


1 1 1 1 
Log’(zo) = lim r = 
w-wo ——— 


w— wo 


exp'(woọ)  exp(wo)  zo` 


Since this is well-defined everywhere but for zọ = 0, which does not belong 
to D, we see that Log(z) is analytic in D. 
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Other branches 


The choice of branch for the logarithm is basically that, a choice. It is 
certainly not the only one. We can make the logarithm function single- 
valued in other regions of the complex plane by choosing a different branch 
for the argument function. 

For example, another popular choice is to consider the function Argo(z) 
which is the value of the argument function for which 


0 < Argo(z) < 27. 


This function, like Arg(z), is single-valued but discontinuous; however the 
discontinuity is now along the positive real axis, since approaching a positive 
real number from the upper half-plane we would conclude that its argument 
tends to 0 whereas approaching it from the lower half-plane the argument 
would tend to 27. We can therefore define a branch Log,(z) of the logarithm 
by 

Logo(z) = Log |z| + i Argo(z) . 


This branch then has a branch cut along the non-negative real axis, but it is 
continuous in its complement Do as shown in Figure22.2} The same argument 
as before shows that Log,(z) is analytic in Do with derivative given by 


1 
Logo(20) = a for all z in Do. 


Do D, Aa 
ES 


Figure 2.2: Two further branches of the logarithm. 


There are of course many other branches. For example, let r be any real 
number and define the branch Arg,(z) of the argument function to take the 
values 

T < Arg (z)<7+27. 


This gives rise to a branch Log,(z) of the logarithm function defined by 


Log,(z) = Log |z| +1 Arg,(z) , 
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which has a branch cut emanating from the origin and consisting of all those 
points z with arg(z) = 7 modulo 27. Again the same arguments show that 
Log,(z) is analytic everywhere on the complement D, of the branch cut, as 
shown in Figure 22.2] and its derivative is given by 


Log’ (zo) = = for all zo in D,. 
0 

The choice of branch is immaterial for many properties of the logarithm, 
although it is important that a choice be made. Different applications may 
require choosing one branch over another. Provided one is consistent this 
should not cause any problems. 

As an example suppose that we are faced with computing the derivative 
of the function f(z) = log(z? + 2iz + 2) at the point z = i. We need to 
choose a branch of the logarithm which is analytic in a region containing a 
neighbourhood of the point i? + 2ii +2 = —1. The principal branch is not 
analytic there, so we have to choose another branch. Suppose that we choose 
Logo(z). Then, by the chain rule 


22 + 24 2¢+ 24 


a | Se ee, 
P+ 2iz+2|,_; 2 +22 +2 ‘ 


f@= 
Any other valid branch would of course give the same result. 


2.1.7 Complex powers 


With the logarithm function at our disposal, we are able to define complex 
powers of complex numbers. Let œ be a complex number. The for all z Æ 0, 
we define the a-th power z° of z by 


a log(z) _ e2 Log |z|+é@ arg(z) (2.22) 


The multiple-valuedness of the argument means that generically there will 
be an infinite number of values for z“. We can rewrite the above expression 
a little to make this manifest: 


z% = e” Log |z|+ia Arg(z)}+ia2rk _ e2 Log(z) 1a 2r k l 


for k = Ue ees 
Depending on a we will have either one, finitely many or infinitely many 
values of exp(i27 ak). Suppose that a is real. If a = n is an integer then 
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so is ak = nk and exp(i 2r a k) = exp(i 2r nk) = 1. There is therefore only 
one value for z”. This is as we expect, since in this case we have 


1 for n = 0, 
zz- z forn>0, 
z = 4 

n times 

= for n < 0. 


If a = p/q is a rational number, where we have chosen the integers p and 
q to have no common factors (i.e., to be coprime), then 2?/4 will have a 
finite number of values. Indeed consider exp(i 2r kp/q) as k ranges over the 
integers. It is clear that this exponential takes the same values for k and for 


k+q: 


et?" (k+a)p/d — pi2m (k(p/a)+p) i2nk(p/q)+i2mp _ e127 kp/q 
X 


= € 


where we have used the addition and periodicity properties and 
of the exponential function. Therefore 2”/4 will have at most q distinct values, 
corresponding to the above formula with, say, k = 0,1,2,...,q— 1. In fact, 
it will have precisely q distinct values, as we will see below. Finally, if a 
is irrational, then z% will possess an infinite number of values. To see this 
notice that if there are integers k and k’ for which e!°27* = eta?Tk' then 
we must have that et~?" (k-k) = 1, which means that a (k — k’) must be an 
integer. Since a is irrational, this can only be true if k = k’. 
For example, let us compute 11⁄4. According to the formula, 


11/4 — ebos()/¢ ei2m(k/q) — pi2n(k/q) 


as k ranges over the integers. As discussed above only the q values k = 
0,1,2,...,q — 1 will be different. The values of 1//7 are known as q-th 
roots of unity. They each have the property that their q-th power is equal 
to 1: (1/9)? = 1, as can be easily seen from the above expression. Let 
w = exp(i2z/q) correspond to the k = 1 value of 1!/7. Then the g-th roots 
of unity are given by 1,w,w?,...,w% +, and there are q of them. The q-th 
roots of unity lie in the unit circle |z| = 1 in the complex plane and define 
the vertices of a regular g-gon. For example, in Figure [2.3] we depict the g-th 
roots of unity for q = 3,5, 7,11. 

Let z be a nonzero complex number and suppose that we are after its 
q-th roots. Writing z in polar form z = |z| exp(i 0), we have 


gd = |z|! @t 8/4, k for k = 0,1,2,...,q— 1. 


In other words the q different values of z1⁄4 are obtained from any one value 
by multiplying it by the q powers of the q-th roots of unity. If p is any integer, 
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Figure 2.3: Some roots of unity. 


we can then take the p-th power of the above formula: 
2P/d — | z|P/4 iP 9/a,,Pk for k = 0,1,2,...,q—1. 


If p and q are coprime, the w” for k = 0,1,2,...,q—1 are different. Indeed, 
suppose that w?* = w?*" for k and k’ between 0 and g—1. Then w?*-*) = 1, 
which means that p(k — k’) has to be a multiple of q. Because p and q are 
coprime, this can only happen when k = k’. Therefore we see that indeed a 
rational power p/q (with p and q coprime) of a complex number has precisely 
q values. 

Let us now consider complex powers. If a = a+ ib is not real (so that 
b #0), then z% will always have an infinite number of values. Indeed, notice 
that the last term in the following expression takes a different value for each 


integer k: 
ei ark = et (atib) 27 k = ei 2tkae—2rkb ; 


For examples, let us compute i’. By definition, 


ii =e? log(i) _ et (Log(i)+i2r k) __ el (in/2+i2r k) e7T/2 e7?Tk 


for k = 0. + 1, +2, ..., which interestingly enough is real. 


Choosing a branch for the complex power 


Every branch of the logarithm gives rise to a branch of z“. In particular we 
define the principal branch of z“ to be exp(a Log(z)). Since the exponential 
function is entire, the principal branch of z% is analytic in the domain D 
where Log(z) is analytic. We can compute its derivative for any point zo in 
D using the chain rule (2.12): 


d Qœ LOg(z Qa LOg| Zz Q 
= (e ts)| =e oe 


Given any nonzero zo in the complex plane, we can choose a branch of the 
logarithm so that the function z® is analytic in a neighbourhood of zg. We 
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can compute its derivative there and we see that the following equation holds 


d oa eee 
Tz (2°) = 2 20 ae 
provided that we use the same branch of z% on both sides of the equation. 
One might be tempted to write the right-hand side of the above equation 
as a z$~', and indeed this is correct, since the complex powers satisfy many 
of the identities that we are familiar with from real powers. For example, 
one can easily show that for any complex numbers a and 3 


z% zÊ = 20 B 


provided that the same branch of the logarithm, and hence of the complex 
power, is chosen on both sides of the equation. Nevertheless, there is one 
identity that does not hold. Suppose that a@ is a complex number and let 
zı and z be nonzero complex numbers. Then it is not true that zf 2z% and 
(z1 22) agree, even if, as we always should, we choose the same branch of 
the complex power on both sides of the equation. 

We end this section with the observation that the function z7 is analytic 
wherever the chosen branch of the logarithm function is defined. Indeed, 
z* = exp(z log(z)) and its principal branch can is defined to be the function 
exp(z Log(z)), which as we now show is analytic in D. Taking the derivative 
we see that 


d (e? a | 


dz 


which exists everywhere on D. Again a similar result holds for any other 
branch provided we are consistent and take the same branches of the loga- 
rithm in both sides of the following equation: 


d 
dz 


= e% Log(zo) (Log(zo) +1) , 


Z=20 


(27) = 20" (log(2o) + 1) - 


2.2 Complex integration 


Having discussed differentiation of complex-valued functions, it is time to 
now discuss integration. In real analysis differentiation and integration are 
roughly speaking inverse operations. We will see that something similar 
also happens in the complex domain; but in addition, and this is unique to 
complex analytic functions, differentiation and integration are also roughly 
equivalent operations, in the sense that we will be able to take derivatives 
by performing integrals. 
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2.2.1 Complex integrals 


There is a sense in which the integral of a complex-valued function is a trivial 
extension of the standard integral one learns about in calculus. Suppose that 
f is a complex-valued function of a real variable t. We can decompose f(t) 
into its real and imaginary parts f(t) = u(t) +7 v(t), where u and v are now 
real-valued functions of a real variable. We can therefore define the integral 
hs f(t) dt of f(t) on the interval [a,b] as 


[rows f unari f ooa, 


provided that the functions u and v are integrable. We will not develop 
a formal theory of integrability in this course. You should nevertheless be 
aware of the fact that whereas not every function is integrable, a continuous 
function always is. Hence, for example, if f is a continuous function in the 
interval [a,b] then the integral if. f(t) dt will always exist, since u and v are 
continuous and hence integrable. 

This integral satisfies many of the properties that real integrals obey. For 
instance, it is (complex) linear, so that if œ and 8 are complex numbers and 
f and g are complex-valued functions of t, then 


i (a f(t) + Bg(t)) dt =a f E f g(t) dt 


It also satisfies a complex version of the Fundamental Theorem of Calculus. 
This theorem states that if f(t) is continuous in [a,b] and there exists a 
function F(t) also defined on [a,b] such that F(t) = f(t) for alla < t < b, 


where F(t) = dE then 


[10 t) dt = [ TO =F F(b) — F(a) . (2.23) 


This follows from the similar theorem for real integrals, as we now show. Indeed, let us 
decompose both f and F into real and imaginary parts: f(t) = u(t) + iv(t) and F(t) = 

U(t)+iV(t). Then since F is an antiderivative F(t) = U(t) +iV(t) = f(t) = u(t) +iv(d), 
whence U(t) = u(t) and V(t) = v(t). Therefore, by definition 


[tou= fw wari [oat 


= U(b) — U(a) +i (V(b) — V (a)) 
= U(b) +i V(b) — (U(a) +i V (a)) 
= F(b) — F(a), 


where to reach the second line we used the real version of the fundamental theorem of 
calculus for the real and imaginary parts of the integral. 
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A final useful property of the complex integral is that 


f to a! = [iso dt. (2.24) 


This result makes sense intuitively because in integrating f(t) one might 
encounter cancellations which do not occur while integrating the non-negative 
quantity | f(t)|. 


e 


This last property follows from the similar property of real integrals. Let us see this. Write 


the complex integral JÈ f(t) dt in polar form: 


f ronse”, 


where 


On the other hand, 


a 


Write e~t? f(t) = U(t) + iV(t) where U(t) and V(t) are real-valued functions. 


because R is real, 
b 
R =f U(t)dt. 
a 


But now, 
U(t)=Re eft) < e fU =O. 


Therefore, from the properties of real integrals, 


[owas f roa. 


which proves the desired result. 


2.2.2 Contour integrals 


Then 


Much more interesting is the integration of complex-valued functions of a 
complex variable. We would like to be able to make sense out of something 


like 


[ fed, 


where z and zı are complex numbers. We are immediately faced with a 
difficulty. Unlike the case of an interval [a,b] when it is fairly obvious how 
to go from a to b, here z and zı are points in the complex plane and there 
are many ways to go from one point to the other. Therefore as it stands, 
the above integral is ambiguous. The way out of this ambiguity is to specify 
a path joining z to zı and then integrate the function along the path. In 
order to do this we will have to introduce some notation. 
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The integral along a parametrised curve 


Let z and z, be two points in the complex plane. One has an intuitive notion 
of what one means by a curve joining z and z1. Physically, we can think 
of a point-particle moving in the complex plane, starting at some time fo at 
the point zo and ending at some later time tı at the point z1. At any given 
instant in time to < t < tı, the particle is at the point z(t) in the complex 
plane. Therefore we see that a curve joining z and zı can be defined by 
a function z(t) taking points t in the interval [to,t,] to points z(t) in the 
complex plane in such a way that z(to) = zo and z(t,;) = 21. Let us make 
this a little more precise. By a (parametrised) curve joining z and zı we 
shall mean a continuous function z : [to, tı] — C such that z(to) = z and 
z(t,) = z1. We can decompose z into its real and imaginary parts, and this 
is equivalent to two continuous real-valued functions x(t) and y(t) defined 
on the interval [to,t,] such that x(to) = xo and x(t,) = xı and similarly for 
y(t): y(to) = yo and y(ti) = yı, where z = zo + iyo and 4% = zı + iyı. 
We say that the curve is smooth if its velocity z(t) is a continuous function 
[to, t1] — C which is never zero. 

Let [ be a smooth curve joining 2p to z1, and let f(z) be a complex-valued 
function which is continuous on I’. Then we define the integral of f along 
T by 


(2.25) 


By hypothesis, the integrand, being a product of continuous functions, is 
itself continuous and hence the integral exists. 

Let us compute some examples. Consider the function f(z) = x? + iy? 
integrated along the smooth curve parametrised by z(t) = t+it for0 <t <1. 
As shown in Figure[2.4]this is the straight line segment joining the origin and 
the point 1+7. Decomposing z(t) = x(t)+7 y(t) into real and imaginary parts, 
we see that x(t) = y(t) = t. Therefore f(z(t)) = t? + it? and z(t) =1+i. 
Putting it all together, using complex linearity of the integral and performing 
the elementary real integral, we find the following result 

i : B| 2i 

[tous EHAA | G+ Pdt =2 >| =—. 

r (0) 0 3 0 3 

Consider now the function f(z) = 1/z integrated along the smooth curve 
I parametrised by z(t) = Rexp(i27t) for 0 < t < 1, where R #0. As 
shown in Figure[2.4] the resulting curve is the circle of radius R centred about 
the origin. Here f(z(t)) = (1/R) exp(—i 2r t) and z(t) = 271 Rexp(i 2r t). 
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Figure 2.4: Two parametrised curves. 


Putting it all together we obtain 


19 : i2nt T 
[i@a= f am d= i | dt=2ri. (2.26) 
r 0 Ret2nt 0 


Notice that the result is independent of the radius. This is in sharp contrast 
with real integrals, which we are used to interpret physically in terms of area. 
In fact, the above integral behaves more like a charge than like an area. 

Finally let us consider the function f(z) = 1 along any smooth curve T 
parametrised by z(t) for 0 < t < 1. It may seem that we do not have enough 
information to compute the integral, but let us see how far we can get with 
the information given. The integral becomes 


[te dz = f dt. 


Using the complex version of the fundamental theorem of calculus, we have 


/ z(t) dt = 2(1) — 2(0) , 


independent of the actual curve used to join the two points! Notice that this 
integral is therefore not the length of the curve as one might think from the 
notation. 


The length of a curve and a useful estimate 


The length of the curve can be computed, but the integral is not related to 
the complex dz but the real |dz|. Indeed, if IT is a curve parametrised by 
z(t) = x(t) +7y(t) for t € [to, ti], consider the real integral 


f= a (O) at 


tı 
= VI + y(t)? dt , 
to 
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which is the integral of the infinitesimal line element ,/dx? + dy? along the 
curve. Therefore, the integral is the (arc)length ¢(I) of the curve: 


(2.27) 


This immediately yields a useful estimate on integrals along curves, analogous 
to equation (2.24). Indeed, suppose that T is a curve parametrised by z(t) 
for t € [to, ti]. Then, 


fto dz 


ECOLO a 
< f PEOL. (using CD) 
< max lAl f Oat. 


But this last integral is simply the length (T) of the curve, whence we have 


| fa < f EON < max FQ) (2.28) 


Results of this type are the bread and butter of analysis and in this part of 
the course we will have ample opportunity to use this particular one. 


Some further properties of the integrals along a curve 


We have just seen that one of the above integrals does not depend on the 
actual path but just on the endpoints of the contour. We will devote the next 
two sections to studying conditions for complex integrals to be independent 
of the path; but before doing so, we discuss some general properties of the 
integrals fp f(z) dz. 

The first important property is that the integral is complex linear. That 
is, if œ and @ are complex numbers and f and g are functions which are 
continuous on I, then 


[tate +302) dz=a f iedz f eae. 


The proof is routine and we leave it as an exercise. 
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The first nontrivial property is that the integral fp f(z) dz does not de- 
pend on the actual parametrisation of the curve I. In other words, it is 
a “physical” property of the curve itself, meaning the set of points T Cc C 
together with the direction along the curve, and not of the way in which we 
go about traversing them. 


The only difficult thing in showing this is coming up with a mathematical statement to 
prove. Let z(t) for to < t < tı and 2’(t) for tj < t < t] be two smooth parametrisations 
of the same curve I’. This means that z(to) = z’ (tọ) and z(t1) = 2/(t,). We will say that 
the parametrisations z(t) and z’(t) are equivalent if there exists a one-to-one differentiable 
function A : [t>,¢4] — [to,t1] such that z’(t) = z(A(t)). In particular, this means that 
A(to) = to and A(t,) = tı. (It is possible to show that this is indeed an equivalence 
relation.) 


The condition of reparametrisation invariance of Ir f(z) dz can then be stated as follows. 
Let z and z’ be two equivalent parametrisations of a curve I. Then for any function f(z) 
continuous on I, we have 


th ty 
firea f EDO. 


Let us prove this. 


ti ti 
f EDO SEADA a 


_ pD dz(A) 
= fu (EOD GO 


tı dz(X) 
dX 


=f FA) 


to 


dd, 


which after changing the name of the variable of integration from A to t (Shakespeare’s 
Theorem!), is seen to agree with 


tı 
f(z(t)) z(t) dt . 


to 


Because of reparametrisation invariance, we can always parametrise a 
curve in such a way that the initial time is t = 0 and the final time is 
t = 1. Indeed, let z(t) for to < t < tı be any smooth parametrisation of a 
curve I. Then define the parametrisation z'(t) = z(to + t(tı — to)). Clearly, 
z'(0) = z(to) and 2’(1) = z(t1), and moreover 2’(t) = (tı — to)ż (to +t(tı — to)) 
hence z’ is also smooth. 

Now let us notice that parametrised curves [ have a natural notion of 
direction: this is the direction in which we traverse the curve. Choosing a 
parametrisation z(t) for 0 < t < 1, as we go from z(0) to z(1), we trace the 
points in the curve in a given order, which we depict by an arrowhead on 
the curve indicating the direction along which t increases, as in the curves 
in Figure 2.4] A curve with such a choice of direction is said to be directed. 
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Given any directed curve [, we let —I denote the directed curve with the 
opposite direction; that is, with the arrow pointing in the opposite direction. 
The final interesting property of the integral fp f(z) dz is that 


(2.29) 


integrals. By reparametrisation independence it does not matter which parametrisations 
we choose. If z(t) for O < t < 1 is a parametrisation for I, then z/(t) = z(1 — t) for 
0 <t < 1 isa parametrisation for —[. Indeed, z’(0) = z(1) and z’(1) = z(0) and they 
trace the same set of points. Let us compute: 


© To prove this it is enough to find two parametrisations for IT and -I and compute the 


Piecewise smooth curves and contour integrals 


Finally we have to generalise the integral fa f(z) dz to curves which are not 
necessarily smooth, but which are made out of smooth curves. Curves can 
be composed: if Ty is a curve joining z to zı and Tù is a curve joining 2 
to z2, then we can make a curve I joining z to zə by first going to the 
intermediate point zı via [, and then from there via I, to our destination 
zg. The resulting curve I is still continuous, but it will generally fail to be 
smooth, since the velocity need not be continuous at the intermediate point 
z1, as shown in the figure. 
However such curve is piecewise smooth: which 
i means that it is made out of smooth components by 
Io the composition procedure just outlined. In terms of 
parametrisations, if zı(t) and z2(t), for 0 < t < 1, are 
smooth parametrisations for I; and [2 respectively, 
then 


rı 


20 21 


is a parametrisation for I’. Notice that it is well-defined and continuous at 
t = } precisely because z;(1) = 22(0); however it need not be smooth there 
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since 21(1) Æ 22(0) necessarily. We can repeat this procedure and construct 
curves which are not smooth but which are made out of a finite number of 
smooth curves: one curve ending where the next starts. Such a piecewise 
smooth curve will be called a contour from now on. If a contour I is made 
out of composing a finite number of smooth curves {Ij} we will say that each 
I’; is a smooth component of T. 

Let T be a contour with n smooth components {Ij} for j = 1,2,...,n. 
If f(z) is a function continuous on I, then the contour integral of f along 
I is defined as 


[Oe f toa f toa Pedet f Fede, 


with each of the Jr; f(z) dz is defined by (2.25) relative to any smooth para- 


metrisation. 


2.2.3 Independence of path 


In this section we will investigate conditions under which a contour integral 
only depends on the endpoints of the contour, and not not the contour itself. 
This is necessary preparatory material for Cauchy’s integral theorem which 
will be discussed in the next section. 

We will say that an open subset U of the complex plane is connected, 
if every pair of points in U can be joined by a contour. A connected open 
subset of the complex plane will be called a domain. 

© What we have called connected here is usually called path-connected. We can allow 


ourselves this abuse of notation because path-connectedness is easier to define and it can 
be shown that the two notions agree for subsets of the complex plane. 


Fundamental Theorem of Calculus: contour integral version 


First we start with a contour integral version of the fundamental theorem of 
calculus. Let D be a domain and let f : D — C be a continuous complex- 
valued function defined on D. We say that f has an antiderivative in D if 
there exists some function F : D — C such that 


= FS) L fo. 

z 
Notice that F is therefore analytic in D. Now let I be any contour in D with 
endpoints z and zı. If f has an antiderivative F on D, the contour integral 
is given by 


F'(2) 


fto dz = F(z) — F (zo) . (2.30) 
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Let us first prove this for [ a smooth curve, parametrised by z(t) for 
0<t<1. Then 


frees f reozoa- f EGO a. 


Using the complex version of the fundamental theorem of calculus (2.23), we 
see that 


ft dz = F(2(1)) — F(2(0)) = F(a) — F(2o) - 


r 
Now we consider the general case: I a contour with smooth components 
{Cj} for j = 1,2,...,n. The curve T; starts in zp and ends in some inter- 


mediate point 7,, lo starts in 7; and ends in a second intermediate point 79, 
and so so until [’,, which starts in the intermediate point 7,_; and ends in 
zy. Then 


fror- 2 [ tow 


d \d 
„TO z+ wae z+: Mi fiz 


— F (11) — F (zo) + Tre- F(m)+ -+ F(z) — F(t-1) 
= F(z) = F(z) , 


where we have used the definition of the contour integral and the result 
proven above for each of the smooth components. 

This result says that if a function f has an antiderivative, then its contour 
integrals do not depend on the precise path, but only on the endpoints. Path 
independence can also be rephrased in terms of closed contour integrals. We 
say that a contour is closed if its endpoints coincide. The contour integral 
along a closed contour T is sometimes denoted ¢, when we wish to emphasise 
that the contour is closed. 


The path-independence lemma 


As a corollary of the above result, we see that if T is a closed contour in some 
domain D and f : D — C has an antiderivative in D, then 


frode- 


This is clear because if the endpoints coincide, so that zo = 21, then F(z1)— 
In fact, let f : D — C be a continuous function on some domain D. Then 
the following three statements are equivalent: 
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(a) f has an antiderivative F in D; 


(b) The closed contour integral $, f(z) dz vanishes for all closed contours 
I in D; and 


(c) The contour integrals f f(z) dz are independent of the path. 


We shall call this result the Path-independence Lemma. 
We have already proven that (a) implies (b) and (c). We will now show 
that in fact (b) and (c) are equivalent. 
Let Iı and I% be any two contours in D sharing the 


Tı z1 same initial and final endpoints: zp and 21, say. Then 

consider the contour I obtained by composing T°; with 

—I,. This is a closed contour with initial and final 

r2 endpoint zọ. Therefore, using (2.29) for the integral 
along —T>, 


f f@)de= f f(z)dz + f(z) dz 


—T2 


flz)dz— | flz)dz 
rı T2 


whence fF z)dz = 0 if and only if Tad zdz = het z)dz. This shows 
that (b) A i (c). Now we prove that, oa (c ) ee (b). Let T 
be any closed contour with endpoints z1 = z. By path-independence, we 
can evaluate the integral by taking the trivial contour which remains at zo 
for all 0 < t <1. This parametrisation is strictly speaking not smooth since 
z(t) = 0 for all t, but the integrand f(z(t))z(t) = 0 is aye continuous, so 
that the integral exists and is clearly zero. Hence fp f(z) dz = 0 for all closed 
contours I. Alternatively, we can pick any point 7 in contour not equal 
to zo = 21. We can think of the contour as made out of two contours: I; from 
zo to T and I, from 7 to 21 = zo. We can therefore go from zp = z; to T in two 
ways: one is along I; and the other one is along —I'2. Path-independence 
says that the result is the same: 


4 f(z) dz = flz)dz=— : f(z) dz 


>s 


where we have used equation (2.29). Therefore, 


f(z)dz + f flade= | fede 
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Finally we finish the proof of the path-independence lemma by showing 
that (c) implies (a); that is, if all contour integrals are path-independence, 
then the function f has an antiderivative. The property of path-independence 
suggests a way to define the antiderivative. Let us fix once and for all a point 
zo in the domain D. Let z be an arbitrary point in D. Because D is connected 
there will be a contour [ joining zp and z. Define a function F(z) by 


F(2) = l IOL, 


where we have changed notation in the integral (Shakespeare’s Theorem 
again) not to confuse the variable of integration with the endpoint z of the 
contour. By path-independence this integral is independent of the contour 
and is therefore well-defined as a function of the endpoint z. We must now 
check that it is an antiderivative for f. 

The derivative of F(z) is computed by 


PO = jim, =| f AO- f reac] . 


where I” is any contour from zọ to z+Az. Since we are interested in the limit 
of Az — 0, we can assume that Az is so small that z+Az is contained in some 
open £-disk about z which also belongs to DE This means that the straight- 
line segment I” from z to z + Az belongs to D. By path-independence we 
are free to choose the contour I’, and we exercise this choice by taking I” to 
be the composition of I with this straight-line segment I”. Therefore, 


sox- f roa=f roar f roa f roa 
= f Oa, 


whence 


We parametrise the contour I” by C(t) = z + tAz for 0 < t <1. Then we 


2In more detail, since D is open we know that there exists some ¢ > 0 small enough 
so that D(z) belongs to D. We then simply take |Az| < £, which we can do since we are 
interested in the limit Az —> 0. 
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have 
F'(z) = lim — xf f(z+tAz) C(t) dt 


= lim — af f(z+tAz) Az dt 


lim L f(z+tAz) dt . 


One might be tempted now to simply sneak the limit inside the integral, use 
continuity of f and obtain 


Az—-0 


rt f in: Fie es) w= f fo Heh ai = Ha 


which would finish the proof. However sneaking the limit inside the integral 
is not always allowed since integration itself is a limiting process and limits 
cannot always be interchanged. 


© A simple example showing that the order in which one takes limits matters is the following. 


Consider the following limit 
m+n 


m 
lim lim = lm 1=1; 
n—=>œ m> m+n n—> o0 
yet on the other 
m 
lim lim = lim 0=0. 
m—>oon->co m+n m= oo 


Nevertheless, as we sketch below, in this case interchanging the limits 
turns out to be a correct procedure due to the continuity of the integrand. 


© We want to prove here that indeed 
1 
lim f f(z +tAz) dt = f(z). 


Az—0 Jo 


We do this by showing that in this limit, the quantity 
1 1 
f tertand -f= | Hetta- se) at 
0 0 


goes to zero. We will prove that its modulus goes to zero, which is clearly equivalent. By 
equation (2.24), we have 


1 í 
f [f(z + tAz) — f(z)] dt <f |f (z + tAz) — f(z)| dt. 
0 0 
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By continuity of f we know that given any € > 0 there exists a ô > 0 such that 
|f(z+tAz) — f(z)|<e whenever |Az| <ô. 


Since we are taking the limit Az — 0, we can take |Az| < 6, whence 


lim 


1 1 I 
dm, f PEHA- SG) dt < Jim, f [f(z + tAz) — f(2)| dt < i edt=e , 


for any £ > 0, where we have used equation (2.24) to arrive at the last inequality. Hence, 


Az—> 


1 
im, f [f(z + tAz) — f(2)] dt =0, 


so that i 
dm, f [F +tAz2)— Fe) dt =0. 


2.2.4 Cauchy’s Integral Theorem 


We have now laid the groundwork to be able to discuss one of the key results 
in complex analysis. The path-independence lemma tells us that a continuous 
function f : D — C in some domain D has an antiderivative if and only if 
all its closed contour integrals vanish. Unfortunately it is impractical to 
check this hypothesis explicitly, so one would like to be able to conclude the 
vanishing of the closed contour integrals some other way. Cauchy’s integral 
theorem will tell us that, under some conditions, this is true if f is analytic. 
These conditions refer to the topology of the domain, so we have to first 
introduce a little bit of notation. 

Let us say that a contour is simple if it has no self-intersections. We 
define a loop to be a closed simple contour. We start by mentioning the 
celebrated Jordan curve lemma, a version of which states that any loop in 
the complex plane separates the plane into two domains with the loop as 
common boundary: one of which is bounded and is called the interior and 
one of which is unbounded and is called the exterior. 


This is a totally obvious statement and as most such statements extremely hard to prove, 
requiring techniques of algebraic topology. 


We say that a domain D is simply-connected if the interior domain 
of every loop in D lies wholly in D. Hence for example, a disk is simply 
connected, while a punctured disk is not: any circle around the puncture 
contains the puncture in its interior, but this has been excised from the disk. 
Intuitively speaking, a domain is simply-connected if any loop in the domain 
can be continuously shrunk to a point without any point of the loop ever 
leaving the domain. 
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We are ready to state the Cauchy Integral Theorem: Let D C C bea 
simply-connected domain and let f : D — C be an analytic function, then 
for any loop I, the contour integral vanishes: 


fiO- 


As an immediate corollary of this theorem and of the path-independence 
lemma, we see that an analytic function in a simply-connected domain has 
an antiderivative, which is itself analytic in D. 

We will actually prove a slightly weaker version of the theorem which 
requires the stronger hypothesis that f’(z) be continuous in D. Recall that 
analyticity only requires f'(z) to exist. The proof uses a version of Green’s 
theorem which is valid in the complex plane. This theorem states that if 
V(a,y) = P(az,y) dx + Q(z, y) dy is a continuously differentiable vector field 
in a simply-connected domain D in the complex plane, and if I is any posi- 
tively oriented loop in D, then the line integral of V along I can be written 


as the area integral of the function oe — a on the interior Int(T) of T: 


AG (x,y) dx + Q(z, y) dy) = -F dz dy . (2.31) 


Int(T) 


We will sketch a proof of this theorem below; but now let us use it to prove 
the Cauchy Integral Theorem. Let I be a loop in a simply-connected domain 
D in the complex plane, and let f(z) be a function which is analytic in D. 
Computing the contour integral, we find 


ps) de= | (ley) + iole) CEET 
= f (u(z,y) de — v(2,y) dy) +i f (ule, y) de + ulz, y) dy) . 


r 


By hypothesis, f’(z) is continuous, which means that the vector fields u dx — 
v dy and v dx+u dy are continuously differentiable, whence we can use Green’s 
Theorem (2.31) to deduce that 


froe ff (EB) war ff (2-8) i 


Int(T) Int(T) 


which vanishes by the Cauchy—Riemann equations (2.7). 
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Here we will sketch a proof of Green’s Theorem (2.31). The strategy will be the following. 
We will approximate the interior of the loop by tiny squares (plaquettes) in such a way 
that the loop itself is approximated by the straight line segments which make up the edges 
of the squares. As the size of the plaquettes decreases, the approximation becomes better 
and better. In the picture we have illustrated this by showing three approximations to the 
unit disk. For each we show the value of the length £ of the contour and of the area A of 
its interior. 


£=9.6 £ = 7.68 £= 7.68 L= 
A = 2.9952 A = 2.9952 A = 3.1104 A=r 


In fact, it is a simple matter of careful bookkeeping to prove that in the limit, 
= li X i 
JI size 50 JI 
Int(T) plaquettes II Trnt(II) 


Similarly for the contour integral, 


$ = Pa 3 f, f 


plaquettes I 


To see this notice that the contour integrals along internal edges common to two adjacent 
plaquettes cancel because of equation and the fact that we integrated twice along 
them: once for each plaquette but in the opposite orientation, as shown in the picture 
below. Therefore we only receive contributions from the external edges. Since the region 
is simply-connected this means that boundary of the region covered by the plaquettes. 


IM 


—— : 
\ Ilo 


In the notation of the picture, then, one has 


te ge 


Therefore it is sufficient to prove formula (2.31) for the special case of one plaquette. To 
this effect we will choose our plaquette II to have size Ax x Ay and whose lower left-hand 
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corner is at the point (xo, yo): 


(xo, yo + Ay) (xo + Ax, yo + Ay) 


(x0, yo) (xo + Az, yo) 


Performing the contour integral we have for V(x, y) = P(x,y) dx + Q(x, y)dy, 


(zo +Az,yo) (zo tAx,yotAy) 
$ V(z,y) = f V(æ,y)+ i, V(x.) 


z0,y0) xo+Az,yo) 
(xo,yotAy) (z040) 
ate f V(z,y)+ V(z,y). 
(xotAzx,yotAy) (zo, yotAy) 


Along the first and third contour integrals the value of y is constant, whereas along the 
second and fourth integrals it is the value of x which is constant. Taking this into account, 
we can rewrite the integrals as follows 


xrotAz yotAy 
$ Ve, y) = / P(z, yo) dz + f Q(zo + Ax, y) dy 
II z£ 


0 yo 


xO yo 
+f P(x, yo + Ay) dx +f Q(x0, y) dy . 
zotAaz yotAy 


Exchanging the limits of integration in the third and fourth integrals, and picking up a 
sign in each, we can rewrite the integrals as follows: 


$ View) 
II 
xro+Aa 


yotAy 
= [PO Qao + Azv) = Qao) du- f [Peyo + Ay) — P(e yo)] ae 
Y 


0 z0 


But now we make use of the facts that 


aotAa Ə 
Q(xo + Az, y) — Q(20, y) = I 0 AQ(x,y) . 


z0 Ox 


yot AY OP(x, 
P(x, yo + Ay) — P(x, yo) = PED ay; 
yo Y 


whence the integrals become 


yotAy protAx aptAx pyotAy 
V(a,y) = 8Q(x, y) dx dy — P(x,y) dy dx 
0. f 
II yo zo x 


£ 0 Yo y 


dx dy 


S 3Q(z,y) P(x,y) 
To y Ox Oy 


0 


OQ(z,y) ƏP(x,y) drdi 
Oy 


Ox 
Int (II) 


which proves the formula for the plaquette II. 
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Deforming the contour without changing the integral 


The Cauchy Integral Theorem has a very important consequence for the com- 
putation of contour integrals. It basically says that contours can be moved 
about (or deformed) without changing the result of the integral, provided 
that in doing so we never cross a point where the integrand ceases to be 
analytic. Let us illustrate this with a few examples. 

Let us compute the contour integral 


pod, 
EZ 


where E is the positively-oriented ellipse z? + 4y? = 1 
depicted in the figure. Earlier we computed the same 
integral around a circular contour C of radius 1, cen- 
tred at the origin, and we obtained 


1 
f a= ari. 
ORA 


We will argue, using the Cauchy Integral Theorem, that we get the same 
answer whether we integrate along E or along C. Consider the two domains 
in the interior of the circle C but in the exterior of the ellipse Æ. The 
integrand is analytic everywhere in the complex plane except for the origin, 
which lies outside these two regions. The Cauchy Integral Theorem says that 
the contour integral vanishes along either of the two contours which make up 
the boundary of these domains. Let us be more explicit and let us call these 


contours [+ as in the figure below. 
LA 


Then it is clear that 


id= nde f dz § ode. 
om ry Ž r- 7 E? 


Since the interior l+ is simply-connected and the integrand Ł is analytic in 
and on T4, the Cauchy Integral Theorem says that 


1 
$ -dz =0, 
Tg? 
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whence 


1 1 
E C 


In other words, we could deform the contour from E to C without altering 
the result of the integral because in doing so we do not pass over any point 
where the integrand ceases to be analytic. 

Let us illustrate this with another example, which generalises this one. 
Let [ be any positively-oriented loop in the complex plane, let zọ be any 
complex number which does not lie on I’, and consider the following contour 


integral 
1 
$ dz. 
r Z= 40 


We must distinguish two possibilities: zp) is in the interior of I or in the 
exterior. In the latter case, the integral is zero because the integrand is 
analytic everywhere but at zo, hence if zo lies outside I, Cauchy’s Integral 
Theorem applies. On the other hand, if zp is in the interior of I we expect that 
we should obtain a nonzero answer—after all, if T were the circle |z — zo| = 
R > 0, then the same calculation as in yields a value of 277 for the 
integral. In fact, as we will now show this is the answer we get for any 
positively-oriented loop containing Zo in its interior. 

In Figure we have depicted the contour I and a circular contour C 
of radius r about the point zọ. We have also depicted two pairs of points 
(Pi, P2) and (P3, P4): each pair having one point in each contour, as well as 
straight line segments joining the points in each pair. 


P, 
Pi 


Figure 2.5: The contours I and C and some special points. 
Now consider the following loop T: starting and ending at P,, as illus- 


trated in Figure We start at P, and go to P, via the top half of T, call 
this, T4; then we go to P; along the straight line segment joining them, call 
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it —734; then to P, via the upper half of C in the negative sense, call it —C,; 
and then back to Pı via the straight line segment joining P> and P}, call it 
—712. The interior of this contour is simply-connected and does not contain 
the point zp. Therefore Cauchy’s Integral Theorem says that 


ea teen ee) or 
Cas a 


=0, 


from where we deduce that 


1 1 
f a= (f +f +f ) dz. 
TE ea 734 Cy y2/ * — 0 


Pe 


Figure 2.6: The contours IT’; and T3. 


Similarly consider the loop I% starting and ending at P,. We start at P, 
and go to Pı along the lower half of T, call it r; then we go to P, along 
Yi2; then to P; via the lower half of the circular contour in the negative 
sense —C'_; and then finally back to P, along y34. By the same argument 
as above, the interior of lə is simply-connected and 2p lies in its exterior 
domain. Therefore by the Cauchy Integral Theorem, 


an eee) 
a a 


from where we deduce that 


ETERN 
Dae 0 134 C4 y2/ ~ — Z0 
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Putting the two results together, we find that 


1 1 1 
[>e] dz f dz 
T%— Zo rı ZT %0 r Z= zo 

1 1 
=l dz L dz 
C Z 7 Z0 Cc. Z 7 Z0 

f 1 
= —— dz 
C Z7 %0 


= 20t 


In summary, we find that if I is any positively-oriented loop in the complex 
plane and zo a point not in I’, then 


—~_dz= 


f 1 27% for zoin the interior of T; and (2.32) 
r= 20 0 otherwise. f 


In the following section we will generalise this formula in a variety of ways. 


2.2.5 Cauchy’s Integral Formula 


In this section we present several generalisations of the formula (2.32). Let 
f(z) be analytic in a simply-connected domain D, and let I be a positively- 
oriented loop in D. Let zg be any point in the interior of T. Then the Cauchy 
Integral Formula reads 


f(z) = — Ste) dz . (2.33) 


This is a remarkable formula. It says that an analytic function in a simply- 
connected domain is determined by its behaviour on the boundary. In other 
words, if two analytic functions f(z) and g(z) agree on the boundary of a 
simply-connected domain they agree everywhere in the domain. 


vogue in today’s theoretical physics, namely ‘holography’. You all know what the idea of 
an optical hologram is: it is a two-dimensional film which contains enough information to 
reconstruct (optically) a three-dimensional object. In theoretical physics, holography is 
exemplified in the celebrated formula of Beckenstein-Hawking for the entropy of a black 
hole. On the one hand, we know from Boltzmann’s formula that the entropy of a statistical 
mechanical system is a measure of the density of states of the system. The black-hole 
entropy formula says that the entropy is a black hole is proportional to the area of the 
horizon. In simple terms, the horizon of the black hole is the surface within which light 
can no longer escape the gravitational attraction of the black hole. The entropy formula 
is holographic because it tells us that the degrees of freedom of a three-dimensional object 
like a black hole is determined from the properties of a two-dimensional system: the 


© © Cauchy’s Integral Formula is a mathematical analogue of a notion that is very much in 
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horizon, just like with the optical hologram. The “Holographic Principle” roughly states 
that any theory of quantum gravity, i.e., a theory which can explain the microscopic 
origin of the entropy of the black hole, must be able to explain the entropy formula and 
hence be holographic. The Cauchy Integral Formula is holographic in the sense that an 
analytic function in the plane (which is two-dimensional) is determined by its behaviour 
on contours (which are one-dimensional). 


Notice that by equation (2.32), we have that 


Faje 1 f (20) i 


271 Jp Zz— zo 


, 


whence we will have proven the Cauchy Integral Formula if we can show that 


TEOEOPENI 


As a first step in proving this result, let us use the Cauchy Integral Theorem 
to conclude that the above integral can be computed along a small circle C, 
of radius r about zọ without changing its value: 


on A a: -$ = POAT) a 


Moreover since the radius of the circle does not matter, we are free to take 
the limit in which the radius goes to zero, so that: 


fee Mn) a: = iim $ i o i a 


Let us parametrise C, by z(t) = zo + r exp(27it) for t € [0,1]. Then 
f(z) — flo i= f f(z - =F Zo 20) o7 ire2”** dt 
pe? 


Cr Z= 20 ? 
-xi i (FE) — P(e) dt . 


Let us estimate the integral. Using (2.24) we find 


fue-s (æ) DE f ro- f(c0)| db-< max e=- 


|z—zo|=r 


Because f is continuous at zọ—that is, f(z) — f(zo) as z — zo—the limit as 
r — 0 of | f(z) — f(zo)| is zero, whence 


inf f(z F(z) = Fo) 5 sai 
lim Z — 20 
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Formally, continuity of f at zo says that given any € > 0 there is a 6 > 0 such that 
| f(z) — f(zo)| < € whenever |z — zo| < 6. Since we are interested in the limit r — 0, we 
can always take ô small enough so that |f(z) — f(zo)| is smaller than any e. Therefore, 
lim;—o |f(z) or f(z) =0. 


Now let us do something “deep.” We will change notation in the Cauchy 
Integral Formula (2.33) and rewrite it as 


Hija 2G 0 


mi r-z 


dC . 


All we have done is change the name of the variable of integration (Shake- 
speare’s Theorem again!); but as a result we have obtained an integral repre- 
sentation of an analytic function which suggests a way to take its derivative 
simply by sneaking the derivative inside the integral: 


Qri =z) 
n 2? 2 f(Q) 
HO se Pea % 


ins 2 n! HO 
PA Sri f Cz 


Of course such manipulations have to be justified, and we will see that indeed 
this is correct. Given that we are going to spend the effort in justifying this 
procedure, let us at least get something more out of it. 


Integral representation for analytic functions 


We already have at our disposal quite a number of analytic functions: rational 
functions, exponential and related functions, logarithm and complex powers. 
To some extent these are complex versions of functions with which we are 
familiar from real calculus. In this section we will learn of yet another way 
of constructing analytic functions. Functions constructed in this way usually 
do not have names, since anonymity is the fate which befalls most functions. 
But by the same token, this means that the method below is a powerful 
way to construct new analytic functions, or to determine that a function is 
analytic. 

Let g be a function which is continuous in some contour I’ which need 
not be closed. Let z be any complex number not contained in I’, and define 
the following function: 


G(z) = 1 s= de . (2.34) 
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We claim that G(z) is analytic except possible on I’, and 


g(¢) 
G'(z) = f sae (2.35) 
rie er 
This generalises the above discussion in two important ways: g need not be 
analytic (just continuous) and the contour need not be closed. 
To see if G(z) is analytic we need to investigate whether the derivative 
G'(z) exists and is well-defined. By definition, 


j >. Giz+Az)—Glz 


ee A g(6) g(¢) 
= jn | (AY -Z) ag 


g(G)Az 
kms [eee 
g(G) 
~ A230 Jp (C — z — Az) (C — z) a 


Again, we would be done if we could simply take the limit inside the integral: 


2 f tim g(¢) =f 30 
o@4 | jim, oa Raeca® e 


This can be justified (see below), so we are allowed to do so and recover what 
we were after. The formula (2.34) defines an integral representation for the 
analytic function G(z). 


I 
5 


| 
5 


© Let us show that one can take the limit inside the integral, so that 


, I) = 96) 
Am, | @oacanean * hee 


Equivalently we would like to show that in the limit Az — 0, the difference 


g(¢) g(C) 
: Ca ae C-7 ® 


vanishes. We can rewrite this difference as 


9(¢) 
ecaro 


which we would like to vanish as Az — 0. By equation (2.24), we have that 


KG AG 
ea a ee 


en g(C) 
fer ((=e- An e=ae 
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where we have used equation (2.27) for the length (T) of the contour. 


Since g(¢) is continuous on I, |g(¢)| is bounded there: |g(¢)| < M, for some positive real 
M. 


Az 


Because z is not on I, any point ¢ on [ is at least a certain distance 6 from z: |¢—z| > 6 > 0, 
as shown in the above figure. Now by the triangle inequality (2.1), 


I- 2] =]¢- z- Az + Az] < IÇ- z — Az] + [A2], 


whence 
IÇ- z- A2] > |€ = 2| = [Aa]. 


Since we are taking the limit Az — 0, we can choose |Az| < 46 so that 


|C —z-—Az|>6 $6 = 46. 
Therefore putting it all together we find that 
| G ae < MEE) 
r Cz AC 2) 5 
Therefore 
2M&(T 
lim Az f IO) d¢ < lim aoe =0. 
Az—=0 r (Ç —Z- Az)(¢ — 2)? Az—=0 63 


This is as good a place as any to mention another way of writing the triangle inequality 
(2.1), which is sometimes more useful and which was used above: 


lz + wl > |z| = Jul - (2.36) 
To obtain the second version of the triangle inequality from the first we simply make the 
following substitution: z1 + z2 = z, and z2 = —w, so that z1 = z +w. Then we find from 
the (2.1), that |z| < |z + w| + |— w| = |z + w| + |w], which is can be rewritten as (2.36). 


The same argument shows that if we define 


H(z) = f oa, (2.37) 


where n is a positive integer, then H is analytic and its derivative is given 
by 


aan [Sax (2.38) 


The generalised Cauchy Integral Formula 


This has as an important consequence: if f is analytic in a neighbourhood 
of zp, then so are all its derivatives f™. To prove this simply notice that 
if f is analytic in a neighbourhood of zo, there is some e€ > 0 such that f is 
analytic in and on the circle C of radius £ centred at zo; that is, the closed 
disk |Ç — zo| < e. Therefore for any z in the interior of the circle—that is, 
such that |z — zo| < ¢—we have the Cauchy Integral representation 


1O- f uc. 


Ori c-z 


But this integral representation is of the form , whence its derivative 
is given by the analogue of equation (2.35): 


vot LO 
rO- p eu, 


But this is of the general form (2.37) (with n = 2), whence by the above 
results, f’(z) is an analytic function and its derivative is given by the analogue 


of (2.38): 
2 
271i Jo (¢ — z)’ 
which again follows the pattern (2.37). Continuing in this fashion we deduce 
that f’, f”, ... are analytic in the open e-disk about zo. 
In summary, an analytic function is infinitely differentiable, its derivatives 
being given by the generalised Cauchy Integral Formula: 


(2.39) 


Notice that if we put n = 0 in this formula, define 0! = 1 and understand 
the zeroth derivative f© as the function f itself, then this is precisely the 
Cauchy Integral Formula. 


Infinite differentiability of harmonic functions. 


The generalised Cauchy Integral Formula can also be turned around in or- 
der to compute contour integrals. Hence if f is analytic in and on a positively 
oriented loop I’, and if zo is a point in the interior of I’, then 


(2.40) 
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For example, let us compute the following contour integral 


where T is the positively oriented unit circle |z| = 1. This integral is of the 
form with n = 2, f(z) = e57, which is entire and hence, certainly 
analytic in and on the contour, and with z) = 0, which lies in the interior of 
the contour. Therefore by (2.40) we have 


5z 2 

e 1d 
dz = mi> — (e* 

f 23 2! dz? ( ) 

Let us consider a more complicated example. Let us compute the contour 


integral 
f 2z+1 
——— dz, 
rz(z- 1) 
where T is the contour depicted in Figure[2.7] Two things prevent us from ap- 
plying the generalised Cauchy Integral Formula: the contour is not a loop— 
indeed it is not simple—and the integrand is not of the form g(z)/(z — z0)” 


where g(z) is analytic inside the contour. This last problem could be solved 
by rewriting the integrand using partial fractions: 


Jl 
Pe 


2z+1 3 1 1 
= = . 2.41 
2(z—1)2 (z-1) a ey 


However we are still faced with a contour which is not simple. 


Ox @O6 


Figure 2.7: The contour [ and an equivalent pair of contours {Tp, T1}. 


This problem can be circumvented by noticing that the smooth contour T 
can be written as a piecewise smooth contour with two smooth components: 
both starting and ending at the point of self-intersection of I. The first such 
contour is the left lobe of I, which is a negatively oriented loop about z = 0, 
and the second is the right lobe of I, which is a positively oriented loop 
about z = 1. Because the integrand is analytic everywhere but at z = 0 and 
z = 1, the Cauchy Integral Theorem tells us that we get the same result by 
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integrating around the circular contours [o and [; in Figure In other 


words, 
2z+1 2z+1 2z+1 
| z(z—1)? ie $ z(z- 1} ans $ z(z — 1)? ad 


We can now evaluate this in either of two ways. Using the partial fraction 
decomposition (2.41) of the integrand, one finds 


2z+1 1 1 
f tsp te=- 9 -dz = —2r i, 
To z(z—1) To Z To z 


2z4+1 1 
f tf t-e dz = 0 — 2ri = —2r i ; 
n 2z- 1) n z- 1) mnz-1 


whence 


Alternatively we notice that 


2 1 zE f 
f ET. (2-1? dz = —2ri , 
ro 2(z— 1) To 2 


where we have used the fact that ET is analytic in and on Tọ and the 


Cauchy Integral Formula after taking into account that Io is negatively ori- 
ented. Similarly, one has 


2z+1 ae d (2z+1 
a dz= hs 2 dz=2ri — 
$ z(z — 1)? i $ (z —1)? oS OT T ( z ) 


2z+1 
z 


= —2771, 


z=1 


where we have used that is analytic in and on Ij, and the generalised 
Cauchy Integral formula (with n = 1). Therefore again 


Morera’s Theorem 


Finally we discuss a converse of the Cauchy Integral Theorem, known as 
Morera’s Theorem. Suppose that f is continuous in a domain D and has 
an antiderivative F in D. This means that F is analytic, and by what we 
have just shown, so is f(z) = F'(z). Therefore we have just shown that if 
f(z) is continuous with an antiderivative, then f is analytic. Now from the 
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path independence lemma, f has an antiderivative if and only if all its loop 
integrals in D vanish: 
$ Fod = 
r 


Therefore we arrive at Morera’s Theorem which states that: if f(z) is con- 
tinuous in D and all the loop integrals of f(z) in D vanish, then f is analytic. 
This theorem will be of use in Section 


2.2.6 Liouville’s Theorem and its applications 


The generalised Cauchy Integral Formula is one of the cornerstones of com- 
plex analysis, as it has a number of very useful corollaries. An immediate ap- 
plication of the generalised Cauchy Integral Formula is the so-called Cauchy 
estimates for the derivatives of an analytic function. These estimates will 
play an important role in the remainder of this section. 

Suppose that f(z) is analytic in some domain D containing a circle C 
of radius R centred about zp. Suppose moreover that |f(z)| < M for all z 
on the circle C. We can then use the generalised Cauchy Integral Formula 
(2.39) to obtain a bound for the derivatives of f at zo: 


ng: f(z) i < |f| dz} , 


z — Z)"t1 T 2r Jo |z — 29|"+1 


where we have used (2.28) to arrive at the inequality. On the circle, |z — zo| = 
R and |f(z)| < M, whence 


| 
f™ n: 
FC) < Sear feel 


which, using that the length of the contour is 27 R, can be rewritten neatly 
as 

POC) < E, (2.42) 
This inequality is known as the Cauchy estimate. 

As an immediate corollary of this estimate suppose that f is analytic in 
whole complex plane (i.e., that f is an entire function) and that it is bounded, 
so that |f(z)| < M for all z. Then from the Cauchy estimate, at any point 
zo in the complex plane, its derivative is bounded by |f’(zo)| < M/R. But 
because the function is entire, we can take R as large as we wish. Now 
given any number £ > 0, however small, there is always an R large enough 
for which M/R < e€, so that |f’(zo)| < €. Therefore |f’(zo)| = 0, whence 
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f'(zo) = 0. Since this is true for all zo in the complex plane, we have proven 
Liouville’s theorem: 


a bounded entire function is constant. 


This does not violate our experience since the only entire functions we have 
met are polynomials and the exponential and functions we can make out 
of them by multiplication, linear combinations and compositions, and these 
functions are all clearly not bounded. 

Indeed, suppose that P(z) is a polynomial of order N; that is, 


P(z) =2% +ay12 1 +--+ + az +49. 


Then intuitively, for large z we expect that P(z) should go as 2%, since 


the largest power dominates the other ones. The precise statement, to be 
proven below, is that there exists R > 0 large enough such that for |z| > R, 
|P(z)| > elz|‘, where 0 < c < 1 depends on R in such a way that as R tends 
to oo, c tends to 1. 


© Let P(z) be the above polynomial and let A > 1 denote the largest of the moduli of 


coefficients of the polynomial: A = max{|ao|,|a1|,...,|an—1|,1}. Then let us rewrite 
the polynomial as P(z) = zN 1+ay_1/z+---ao/z% . Now by the triangle inequality 
(2.36), 
an-ı ay ao an-ı ay ao 
14 z ! -No tN 2! z ! ENO tN 


aN—1 , , “u  % Qan-1 | A ay _ 20 
i | N= "N $ j : N-1 ' N 
zZ a z zZ a 
A A A 
= bet ya to: 
|z| j= lel 


Now take |z| > 1 so that |z|N > |z|N-! > --- > |z|. Then, 


an—1 i i ay i ao NA 
T i T T i 
z gN=1 i gh |z| 
Therefore, 
14 an—1 ay ao >] NA 
z gN- N = |z| 


ia an-ı “a | a T NA R-NA 
Pog | gN-1 " N = R R : 
Finally then, 
= |,|N an-1 a1 ao R-NA wN 
IP= l" lS Seo oe Se 


Hence c= (R — NA)/R< 1 and as R > œ, c > 1. 
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We are now able to prove the Fundamental Theorem of Algebra which 
states that 


every nonconstant polynomial has at least one zero. 


Indeed, let P(z) be a polynomial and suppose that it does not have any zeros. 
Then 1/P(z) is an entire function. If we manage to prove that this function 
is bounded, then we can use Liouville’s theorem and conclude that 1/P(z), 
and hence P(z), would have to be constant. So let us try to prove that it is 
bounded. Without loss of generality we can assume that the polynomial has 
the form P(z) = zN +an-1207! +- - -+ az + a9 for some N. Let R be such 
that |z| > R, |P(z)| > clz|\%, where 0 < c < 1. Then, for |z| > R, 


1 


1 1 1 
7 


— < < i 
P(o) “aah = eR 


While for |z| < R, then the function 1/P(z), being continuous, is bounded 
in this disk by some M = maxj,)<r 1/|P(z)|. Therefore 1/|P(z)| is bounded 
above for all z by the largest of M and 1/(cR”). Hence 1/P(z) is bounded. 


It is compelling evidence in favour of the vision of mathematics as a coherent whole, that 
a purely algebraic statement like the Fundamental Theorem of Algebra can be proven in 
a relatively elementary fashion using complex analysis. I hope that as physicists we can 
be forgiven the vanity of thinking that this unity of mathematics stems from it being the 
language of nature. 


© © I have always found this proof of the Fundamental Theorem of Algebra quite remarkable. 


2.3 Series expansions for analytic functions 


This section ushers in the second half of this part of the course. The pur- 
pose of this section is to learn about the series representations for analytic 
functions. We will see that every function analytic in a disk can be approxi- 
mated by polynomials: the partial sums of its Taylor series. Similarly every 
function analytic in a punctured disk can be described by a Laurent series, 
a generalisation of the notion of a power series, where we also allow for neg- 
ative powers. This will allow us to discuss the different types of singularities 
that an analytic function can have. This section is organised as follows: we 
start with a study of sequences and series of complex numbers and of complex 
functions and of different notions of convergence and methods of establishing 
convergence. We will then show that a function analytic in the neighbour- 
hood of a point can be approximated there by a power series: its Taylor 
series. We will then discuss power series and prove that every power series 
converges to an analytic function in its domain of convergence, and in fact is 
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the Taylor series of that function. Therefore the power series representation 
of an analytic function is unique. We then introduce Laurent series: which 
allows us to represent analytic functions around an isolated singularity. We 
also prove that they are unique in a sense. We end the section with a dis- 
cussion of the different isolated singularities which an analytic function can 
have. 


2.3.1 Sequences and Series 


In this section we discuss sequences and series and the rudiments of the 
theory of convergence. This is necessary groundwork to be able to discuss 
the Taylor and Laurent series representations for analytic functions. 


Sequences 


By a sequence we mean an infinite set {20, 21, 22, 23,... } of complex num- 
bers. It is often denoted {z,} where the index is understood to run over the 
non-negative integers. Intuitively, a sequence {z,} converges to a complex 
number z if as n increases, z, remains ever closer to z. A precise definition 
is the following. A sequence {zn} is said to converge to z (written zn > z 
or limp +o Zn = 2) if given any € > 0, there exists an integer N, which may 
depend on €, such that for all n > N, |z, — z| < £. In other words, the “tail” 
of the sequence remains arbitrary close to z provided we go sufficiently far 
into it. A sequence which converges to some point is said to be convergent. 
Convergence is clearly a property only of the tail of the sequence, in the sense 
that two sequences which differ only in the first N terms (any finite N) but 
are identical afterwards will have the same convergence properties. 

For example, the sequence {zn = 1/n} clearly converges to 0: |z,| = 1/n 
and we can make this as small as we like by taking n as large as needed. 

A sequence {zn} is said to satisfy the Cauchy criterion (or be a Cauchy 
sequence) if it satisfies the following property: given any € > 0 there exists 
N (again, depending on £) such that |zn — 2m| < € for all n,m > N. This 
criterion simply requires that the elements in the sequence remain ever closer 
to each other, not that they should converge to any point. Clearly, if a 
sequences converges it is Cauchy: simply notice that adding and subtracting 
Zz, 

[Zn = Smal = |(2n — 2) — (2m = 2)| < [zn = z| + [2m = 2| 
by the triangle inequality (2.1). Hence if we want z, and Zm to remain within 
£ of each other for n,m larger than some N, we need just choose N such that 
[zn — z| < €/2 for all n > N. 
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What is a relatively deep result, is that every Cauchy sequence is convergent. This is 
essentially the fact that the complex numbers are complete. To prove this requires a more 
careful axiomatisation of the real number system than we have time for. 


Series 


By a series we mean a formal sum 


of complex numbers, cj, called the coefficients. We say formal since just 
because we can write something down does not mean it makes any sense: it 
does not make much sense to add an infinite number of terms. What does 
make sense is the following: define the n-th partial sum 


n 
Sr =) Cj = Co + C1 te + Cn-1 F Cn 
j=0 


This defines a sequence {Sp}. Then we can analyse the limit as n —> oo 
of this sequence. If one exists, say S, — S, then we say that the series 
converges to or sums to S, and we write 


o. : 
j=0 


Otherwise we say that the series is divergent. Applying the Cauchy criterion 
to the sequence of partial sums, we see that a necessary condition for the 
convergence of a series is that the sequence of coefficients converge to 0. 
Indeed, if {S,,} is convergent, it is Cauchy, whence given any € > 0, there 
exists N such that for all n,m > N, |S, — S| < £. Taking m = n — 1, we 


see that 
n n-1 
J c-Y G| = |n| <E, 
j=0 j=0 


for every n > N. Therefore the sequence {c;} converges to 0. We can 
summarise this as follows 


If Sig; converges, then lim c; =0. 


IOO 


j=0 


This is a necessary criterion for the convergence of a series, so it can be 
used to conclude that a series is divergent, but not to conclude that it is 
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convergent. For example, consider the series 

E 

Sy. (2.43) 
= 27-1 


It is clearly divergent because j/(2j + 1) — 4. On the other hand consider 
the series (we start at 7 = 1 for obvious reasons) 


S ; : (2.44) 


Now the coefficients do converge to zero, but this series is actually divergent. 
One way to see this is to notice that for every n > 1, 


ik 1 tha j+l dr n j+1 dz n+1 dx 
Se oy, ee 
pat ga 4G J Ga A a 


and limp—oo log(n + 1) = œo. On the other hand, the series 


OO 


2 
-2 
j= 4 

does converge. One can argue in a similar style. Notice that for 7 > 2, 


=f a dz 1 
J’ j-l J j-1 2? gg = 1). 


Hence, for all n > 2, 


S; ki 
l 
fni; 

Sa Hi 
A 
= 

~ 

L 

8,| & 
lI 
= 

SS 

z3 

8, = 
| 
bo 
| 

Slr 


j=l j=2 Jea % I~ 
so that in the limit, 
4<? 
po) 
ja J 


Indeed, we will be able to compute this sum very easily using contour inte- 
lore) 2 r x 

gration and it will turn out that }`;— 7 = © ~ 1.6449341. Similarly, one 

can show in the same way that the series 


a 


converges for any p > 1. In fact, p can be any real number. 
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Establishing convergence 


There are two useful tests for establishing the convergence of a series. The 
first one is known as the Comparison Test: Suppose that pa, M; is a 
convergent series whose coefficients are non-negative real numbers: M; > 0. 
Let )77~9 cj be such that |c;| < M; for all sufficiently large j. Then )77" 9 cj 
also converges. 


© Prove the Comparison Test. 


Of course, in order to apply this test we need to have some examples 
of convergent series to compare with. We have already seen the series 
D 1/j”, for p > 1, but perhaps the most useful series we will come across 
is the geometric series X c, where c is some complex number. To inves- 
tigate the convergence of this series, simply notice that |c’| = |c|/ and hence 
the coefficient sequence {c} converges to 0 if and only if |c| < 1. Thus we 
let |c| < 1 from now on. We proceed as follows: 


(l1-—oc)S,=(1—-c)\1+e+---+ce%) =1-c"™", 


whence re re 
1—c” 1 c” 
pee n-i. 
l1—c i l1—c L=¢ 
Therefore taking the modulus, we see that 
1 n+1 
Sn E = [cl , 
l-—c| |l-c 


which converges to 0 as n — oo since |c| < 1. Therefore 


(2.45) 


(2.46) 


j=0 


Its coefficient sequence converges to zero. Notice also that 


ea _ W138 an 
ClO TED TERA 
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Hence for j > 3, 


3+ 2% 1 
GHIPS Y 
But since 5 < 1, the geometric series 
= 1 
Ee 
— 2) 


j=0 


converges. Hence by the comparison test, the original series (2.46) converges 
as well. 
A further convergence criterion is the Ratio Test: Let Dai cj be such 
that the limit 
Cj+1 
Cj 


L= lim 


jroo 


exists. Then if L < 1 the series converges, and if L > 1 the series diverges. 
(Alas, if L = 1 we cannot conclude anything.) 


© Prove the Ratio Test. 


The Ratio Test does not contradict our experience so far: for the geomet- 
ric series L = |c|, and we certainly needed |c| < 1 for convergence. Moreover 
in this case L > 1 implies divergence. Similarly, the series (2.44) has L = 1, 
so that the test tells us nothing. The same goes for the series (2.43). Notice 
that there are series for which the Ratio Test cannot even be applied, since 
the limit L may not exist. 


Sequences and series of functions: uniform convergence 


Our primary interest in series and sequences being the construction of an- 
alytic functions, let us now turn our attention to the important case of se- 
quences and series of functions. Consider a sequence {fp} whose elements 
are functions f,,(z) defined on some domain in the complex plane. For a 
fixed point z we can study the sequence of complex numbers {f,(z)} and 
analyse its convergence. If it does converge, let us call the limit f(z); that 
is, fn(z) — f(z). This procedure defines a function f for those z such that 
the sequence {f,(z)} converges. If this is the case we say that the sequence 
{fn} converges pointwise to f. Now suppose that each fn is continuous (or 
analytic) will f be continuous (or analytic)? It turns out that pointwise 
convergence is too weak in order to guarantee that the limit function shares 
some of these properties of the fn. 
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For instance, it is easy to cook up a pointwise limit of analytic functions 
which is not even continuous. Consider the functions f,(z) = exp(—nz?). 
Clearly these functions are analytic for each n. Let us now consider the 
functions restricted to the real axis: z = x, and consider the limit function 
f(a). For all n, f,(0) = 1, whence in the limit f(0) = 1. On the other hand, 
let x Æ 0. Then given any € > 0, however small, there will be N such that 
exp(—naz”) < e for n > N. Hence 


oe ‘ for x = 0; 


0 otherwise. 


In other words, the limit function has a discontinuity at the origin. Conti- 
nuity would require f(0) = 0. To understand what is going on here, notice 
that to make fa(x) < £ we require that 
en <e => n > a , 

as can be easily seen by taking the logarithm of both sides of the first in- 
equality. Hence as x becomes smaller, the value of n has to be larger and 
larger to the extent that in the limit as x — 0, there is no n for which this 
is the case. 

The above “post mortem” analysis prompts the following definition. A 
sequence of functions {fn} is said to converge to a function f uniformly in 
a subset U if given any € > 0 there exists an N such that for all n > N, 


lfn(z) -—f(z)|<e foalzeU. 


In other words, N can depend on e€ but not on z. 
Similarly one says that a series of functions 


Ail) ’ 


converges pointwise or uniformly if the sequence of partial sums does. 

To show that this definition takes care of the kind of pathologies encoun- 
tered above, let us first of all prove that the uniform limit of continuous 
functions is again continuous. Indeed, let {f,(z)} be a sequence of functions 
which are continuous at zo, and let it converge to a function f(z) uniformly 
in a neighbourhood of zp. We claim that f(z) is continuous at z. This 
means that given any € > 0, there exists 6 > 0 such that |f(z) — f(zo)| < € 
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whenever |z — zo| < 6. To prove this we will employ a device known as the 
é/3 trick. Let us rewrite |f(z) — f(z0)| as follows 


f(z) — F(z) = |f (2) — falz) + falz) — fa(20) + falzo) — F(%0)| 
< |f(z) — fn(2)| + |fnlz) — fn(20)| + [fr(z0) — f (20) 
by the triangle inequality. Now, because f,(z) — f(z) uniformly, we can 
choose n above so large that |f(z) — f,(z)| < ¢/3 for all z, so in particular 


for z = 2. Similarly, because f,,(z) is continuous at zo, there exists 6 such 
that |fn(z) — fn(zo)| < €/3 whenever |z — zo| < 6. Therefore, 


lf (2) — f(0)| < €/3 +£/3+£/3 =e. 


d 


In other words, we have shown that 


the uniform limit of continuous functions is continuous. 


Similarly we will see that the uniform limit of analytic functions is an- 
alytic. Uniform convergence is sufficiently strong to allow us to manipulate 
sequences of functions naively and yet sufficiently weak to allow for many 
examples. For instance we will see that if a series converges uniformly to a 
function, then the series can be differentiated and integrated termwise and 
it will converge to the derivative or integral of the limit function. 

In practice, the way one checks that a sequence {fn} of functions con- 
verges uniformly in U to a function f is to write 


fa(2) = f(z) + Raz) 


and then to see whether the remainder R,,(z) can be made arbitrarily small 
for some large enough n independently of z in U. Let us see this for the 


geometric series: 
ee (2.47) 
j=0 


The partial sums are the functions 


j= gntl 


mesz Aal z+ + e + 
j=0 


l-z 


We claim that this geometric series converges uniformly to the function 1/(1— 
z) on every closed disk |z| < R with R < 1. Indeed, we have the following 
estimate for the remainder: 

1 
falz) — 


l-z 


Bias. Ra 
= < , 
1-—z|7 |1- z| 
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Now, using the triangle inequality (2.36), 


1 1 1 
< < : 
1-z| 7 1-ļ|71-R 


ąz- 1|=]|1- z| > 1- |z| whence 


In other words, 
1 


l-z 


Aa Re+! 


falz) =A toe 


This bound is independent of z and can be made as small as desired since 
R < 1, whence the convergence is uniform. 

Another way to check for uniform convergence is the Weierstrass M- 
test, which generalises the Comparison Test. Suppose that D M; isa 
convergent series with real non-negative terms M; > 0. Suppose further 
that for all z in some subset U of the complex plane and for all sufficiently 
large j, | f;(z)| < M;. Then the series paar f;(z) converges uniformly in U. 
(Notice that the Comparison Test is obtained as a special case, when f;(z) 
are constant functions.) 


© Proof of the Weierstrass M-test. 


Using the Weierstrass M-test we can prove the uniform convergence of the 
geometric series on any closed disk |z| < R < 1. Indeed, notice that |z| = 
|z\’ < RÍ and that since R < 1, the geometric series } -%72 RÍ converges. 


2.3.2 Taylor series 


In this section we will prove the remarkable result that a function analytic 
in the neighbourhood of a point can be approximated by a sequence of poly- 
nomials, namely by its Taylor series. Moreover we will see that convergence 
is uniform inside the largest open disk over which the function is analytic. 

The Taylor series of a function is the result of successive approximations 
of the function by polynomials. Suppose that f(z) is analytic in a neigh- 
bourhood of zp. Then as we saw in Section [2.2.5] f is infinitely differentiable 
around zg. Let us then write down a polynomial function f, such that it 
ie with f at zo up to an including its n-th derivative. In other words, 
f. ) (x9) = f(z) for j =0,1,...,n. The polynomial function of least order 
which satisfies this condition is 


falz) = F (20) + f' (20) (2 — 20) + (z= 20) ++ 


f" (20) 
2 
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The sequence {fn}, if it converges, does so to the Taylor series around zo of 
the function f: 


© Pz | 
D i = ) (z — zo) . (2.48) 


(If zo = 0 this series is also called the Maclaurin series of f.) 

We will now prove the following important result: Let f(z) be analytic 
in the disk |z — zo| < R centred at zy. Then the Taylor series for f around 
zo converges to f(z) for all z in the disk and moreover the convergence is 
uniform on any closed subdisk |z — zo| < r < R. 

The proof uses the generalised Cauchy Integral For- 

r mula with an appropriate choice of contour, as shown 

in the diagram. Let I’ denote the positively oriented 

circle centred at z with radius p where r < p < R. 

ZN By hypothesis, f is analytic in and on the contour I, 

whence for any z satisfying |z — zo| < r, we have the 
Cauchy Integral Formula: 


1 ££ 


2ri Jp -z 


faa 


de . 


Now we rewrite the integrand: 


1 1 1 1 


Cae Cea A E 


and use the geometric series to write 


which is valid because |z — zo| = r < p = |C — zo|. Putting it all together, we 


have À 
1o 2 (z= 2p)? 
a È a 


j=0 


Inserting it into the Cauchy Integral Formula, 


Oni ee Soa (2 = 2) de. 
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Now we would be tempted to interchange the order of the integral and the 
summation and arrive at 


j=0 
SF 

=P e-a, 
jo 7 


where we have used the generalised Cauchy Integral Formula. This manip- 
ulation turns out to be allowed, but doing it this way we do not see the 
uniform convergence. This is done with more care below. 


Let us prove the Taylor series theorem carefully. It is not hard, but it takes a bit more 
bookkeeping. Rather than using the geometric series in its entirety, let us use its n-th 
partial sum: 


r j 2-29 n+l 
1 2 > z — z0 _ Czo 
1 — Z220 l C— 20 t 1 — Z220 , 
6-20 j=0 C—=zo 
whence i 
n 
z—zo 
T- 3 (z — 20) | _ 6—20 
e A (=m Goe 
Into the Cauchy Integral Formula, we have 
n+l 


_ i A (z-z) | = 
to =z $10 Ree i 


Now this is only a finite sum, so by linearity we can integrate it term by term. Using the 
generalised Cauchy Integral Formula we have 


where 


— 1 FO z-z ™* 
Ral) = op oS C— z0 ae 


In other words, 


whence in order to prove uniform convergence of the Taylor series, we only have to show 
that we can make | Ry (z)| as small as desired for all z by simply taking n sufficiently large. 


Let us estimate |Rn(z)|. Using (2.28) 


1 f IOL 2-20" 
Ros g ee al 


We now use that |z — zo| < r, |¢ — zo| = p, |f(¢)| < M for some M, (T) = 27 p, and the 
triangle inequality (2.36), 


I¢ —z| = |(¢ — zo) — (z — 20)| = IÇ — zol = |z - zo| => p =r , 
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whence 
1 1 


I¢-z| 7 p-r 
Therefore, 
M r ™t 
|Rn(zl< L T 
p-r p 


This is what we wanted, because the right-hand side does not depend on z and can be 
made as small as desired by taking n large, since r/p < 1. This proves uniform convergence 


of the Taylor series. 


Notice that this result implies that the Taylor series will converge to 
f(z) everywhere inside the largest open disk, centred at zo, over which f is 


analytic. 
As an example, let us compute the Taylor series for the functions Log z 


around zọ = 1 and also 1/(1 — z) around z = 0. The derivatives of the 
principal branch of the logarithm are: 
di Log z Bree 1 
aes SS E =] 
“28? (HG 
Evaluating at z = 1 and constructing the Taylor series, we have 
© (—1)j+! i 
iey p : 
J 


j=1 


This series is valid for |z — 1| < 1 which is the largest open disk centred at 
z = 1 over which Log z is analytic, as seen in Figure [2.8] Similarly, 


Figure 2.8: Analyticity disks for the Taylor series of Log z and 1/(1 — z). 


di 1 j! 


dzil—z (1—2)? 


whence evaluating at z = 0 and building the Taylor series we find the geo- 


metric series 
CO 
1 : 
=F 
l-z l 


j=0 
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which is valid for |z| < 1 since that is the largest open disk around the origin 
over which 1/(1— z) is analytic, as seen in Figure 2.8} Now notice something 
remarkable. We have two a priori different series representations for the 
function 1/(1 — z) around the origin: one is the Taylor series and another is 
the geometric series. Yet we have shown that these series are the same. This 
is not a coincidence and we will see in Section[2.3.3]that series representations 
for analytic functions are unique: they are all essentially Taylor series. 


Basic properties of Taylor series 


Taking the derivative of the Taylor series for Logz about z = 1 term by 
term, we find the series 


(oe) 
j=l j=0 j=0 
This is a geometric series which for |z — 1| < 1 converges to 


1 1 


1-(l-z) z” 


which is precisely the derivative of Log z. This might not seem at all remark- 
able, but it is. There is no reason a priori why the termwise differentiation 
of an infinite series which converges to a function f(z), should converge to 
the derivative f’(z) of the function. This is because there are two limits 
involved: the limit in the definition of the derivative and the one which we 
take to approach the function f(z), and we know from previous experience 
that the order in which one takes limits matters in general. On the other 
hand, what we have just seen is that for the case of the Log z function, these 
two limits commute; that is, they can be taken in any order. It turns out 
that this is not just a property of Log z but indeed of any analytic function. 

To see this recall that we saw in Section [2.2.5] that if a function f(z) is 
analytic in a disk |z — zo| < R, then so are all its derivatives. In particular 
f(z) and f'(z) have Taylor series in the disk which converge uniformly on 
any closed subdisk. The Taylor series for f’(z) is given by equation (2.48) 
applied to f’ instead of f: 


29 (PAO , 
ye 2) 


j=0 


But notice that the j-th derivative of f’ is just the (j + 1)-st derivative of f: 
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(f) = f9*). Therefore we can rewrite the above Taylor series as 


as FED (z0) 


7 (z— 2). (2.49) 
mo 
On the other hand, differentiating the Taylor series (2.48) for f termwise, we 


get 


where we have reindexed the last sum by introducing k = j — 1. Finally, 
Shakespeare’s Theorem tells us that this last series is the same as the one 
in equation (2.49). In other words, we have proven that if f(z) is analytic 
around zo, the Taylor series for f’(z) around zg is obtained by termwise 
differentiation of the Taylor series for f(z) around 2p. 

Similarly one can show that Taylor series have additional properties. Let 
f(z) and g(z) be analytic around zo. That means that there is some disk 
|z — zo| < R in which the two functions are analytic. Then as shown in 
Section af(z), for a any complex number, and f(z) + g(z) are also 
analytic in the disk. Then one can show 


e The Taylor series for af(z) is the series obtained by multiplying each 
term in the Taylor series for f(z) by a: 


S a (z— x). 
j=0 ` 


e The Taylor series of f(z) + g(z) is the series obtained by adding the 
terms for the Taylor series of f(z) and g(z): 


These results follow from equations and (2.10). 

Finally, let f(z) and g(z) be analytic in a disk |z — zo| < R around Zp. 
We also saw in Section 2.1.4] that their product f(z)g(z) is analytic there. 
Therefore it has a Taylor series which converges uniformly in any closed 
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subdisk. What is the relation between this series and the Taylor series for 
f(z) and g(z)? Let us compute the first couple of terms. We have that the 
first few derivatives of fg are 


(f9)(z0) = f(zo)g9(20) (Fg) (20) = f"(20)9(z0) + f (20)g' (20) 
(Fg)" (z0) = f"(20)9(20) + 2F'(z0)9' (20) + F(z0)9" (20) , 


so that the first few terms of the Taylor series for fg are 


f(zo)g(z0) + (f'(20)9(20) + f(z0)g'(z0)) (2 — 20) 
_ f” (20)g(20) + 2f (z0)g' (20) + f(z0)9" (20) 


| 5 (z — 20) +- 


Notice that this can be rewritten as follows: 


f" (zo) 
2 


(ro + f'(zo)(2 — 20) + (z— x)? +>: +) 


! g” (20) 2 
x (aco) +o lea) 2 — 20) + Ge so)? +--+) 
which looks like the product of the first few terms in the Taylor series of 
f and g. Appearances do not lie in this case and one can show that the 
Taylor series for the product fg of any two analytic functions is the product 
of their Taylor series, provided one defines the product of the Taylor series 
appropriately. 


Let us see this. To save some writing let me write the Taylor series for f(z) as )7729 aj (z— 
zo) and for g(z) as peo bj(z — zo)4. In other words, I have introduced abbreviations 


aj = fO) (z0)/j! and bj = g0) (z0)/j!. The Cauchy product of these two series is defined 
by multiplying the series formally and collecting terms with the same power of z — zo. In 
other words, 


— aj(z — a) x (= bj (z — a) = 5 cj(z— 20) , 
j=0 j=0 j=0 


where 


2 j Í p(k (j-k) 
=> 9 -y Es (z0) 
Cj = agbe = akbj—k kl G _ k)! . 
k,£=0 k=0 k=0 
h+e=j 


On the other hand, the Taylor series for fg can be written an 
29. (3) ; 
5 (fg) i (z0) (z — zo) i 
l j! 
j=0 
where one can use the generalised Leibniz rule to obtain 


g i 
(F9) (20) = 37 £)(z0)99-™ (20) , 


k=0 
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j 


; is the binomial coefficient 


where 


ji _ 
k k!(j— k)! ` 


Therefore the Taylor series for fg can be written as 


co 


œ j 
-ED il Oo- =Y le-a, 
G5 > 


with the cj being the same as above. 


2.3.3 Power series 


Taylor series are examples of a more general type of series, called power series, 
whose study is the purpose of this section. We will see that power series are 
basically always the Taylor series of some analytic function. This shows that 
series representations of analytic functions are in some sense unique, so that 
if we manage to cook up, by whatever means, a power series converging to 
a function in some disk, we know that this series will be its Taylor series of 
the function around the centre of the disk. 
By a power series around zọ we mean a series of the form 


OO 
bp aj (z — 2)" , 
j=0 


and where {a;} are known as the coefficients of the power series. A power 
series is clearly determined by its coefficients and by the point zọ. Given a 
power series one can ask many questions: For which z does it converge? Is 
the convergence uniform? Will it converge to an analytic function? Will the 
power series be a Taylor series? 

We start the section with the following result, which we will state without 
proof. It says that to any power series ee, aj (z — zo) one can associate a 
number 0 < R < œ, called the radius of convergence, depending only on 
the coefficients {a;}, such that the series converges in the disk |z — zo| < R, 
uniformly on any closed subdisk, and the series diverges in |z — zo| > R. 


© Introduce lim sup, root test and the proof of this theorem. 


One can actually give a formula for the number R in terms of the coef- 
ficients {a;} but we will not do so here in general. Instead we will give a 
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formula which is valid only in those cases when the Ratio Test can be used. 

Recall that the Ratio Test says that if the limit 

Cj+1 
G 


L= lim 


j— o0 


(2.50) 


exists, then the series $72 Cj converges for L < 1 and diverges for L > 1. 
In the case of a power series, we have 

j+1 
aj+ı(z — 20)?" 
aj(z — 2)4 


Qj+1 


Qj 


L= lim 


j— o0 


= lim 


j—oo 


|z — zol . 


Therefore convergence is guaranteed if L < 1, which is equivalent to 


Da 
Qj+1 


|z — zo| < lim 


joo 


and divergence is guaranteed for L > 1, which is equivalent to 


Qj 


Qj+41 


|z — zo| > lim 


j—oo 


Therefore if the limit (2.50) exists, we have that the radius of convergence is 
given by 


(2.51) 


Notice that this agrees with our experience with the geometric series (2.47), 
which is clearly a power series around the origin. Since all the coefficients are 
equal, the limit exists and R = 1, which is precisely the radius of convergence 
we had established previously. 


Power series are Taylor series 


We are now going to prove the main result of this section: that a power series 
is the Taylor series of the functions it approximates. This is a very useful 
result, because it says that in order to compute the Taylor series of a function 
it is enough to produce any power series which converges to that function. 
The proof will follow two steps. The first is to show that a power series 
converges to an analytic function and the second step will use the Cauchy 
Integral formula to relate the coefficients of the power series with those of 
the Taylor series. The first step will itself require two preliminary results, 
which we state in some more generality. 
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Suppose that {fn} is a sequence of continuous functions which converges 
uniformly to a function f(z) in the closed disk |z — zo| < R. Let I be 
any contour (not necessarily closed) inside the disk, and let @ be the length 
of the contour. Then we claim that the sequence J, f,(z) dz converges to 
the integral fo f(z)dz. To see this, let € > 0. Then because of uniform 
convergence, there exists N depending only on e€ such that for all n > N, 
one has |f (z) — fn(z)| < £/£ for all z in the disk. Then 
[t@e- f hod =] | FO- he) dz 


< max | f(z) — fn(2)| € (using (2.28) 
< (e/l =e. 


Now suppose that the sequence {f,,} is the sequence of partial sums of some 
infinite series of functions. Then the above result says that one can integrate 
the series termwise, since for any partial sum, the integral of the sum is 
the sum of the integrals. In other words, when integrating an infinite series 
which converges uniformly in some region U along any contour in U, we can 
interchange the order of the summation and the integration. 

Now suppose that the functions {f,,} are not just continuous but actually 
analytic, and let I be any loop; that is, a closed simple contour. Then by 
the Cauchy Integral Theorem, fr fa(z)dz = 0, whence by what we have just 
shown 


fro dz = lim ® fa(z)dz=0. 
r T 


n— o0 


Therefore by Morera’s theorem, f(z) is also analytic. Therefore we have 
shown that 


the uniform limit of analytic functions is analytic. 


In particular, let X -3o a;(z — zo)Í be a power series with circle of conver- 
gence |z — zo| = R > 0. Since each of the partial sums, being a polynomial 
function, is analytic in the disk (in fact, in the whole plane), the limit is also 
analytic in the disk. In other words, a power series converges to an analytic 
function inside its disk of convergence. 

Now that we know that Ea aj(z — zo)? defines an analytic function, 
call it f(z), in its disk of convergence, we can compute its Taylor series and 
compare it with the original series. The Taylor series of f(z) around 2 has 
coefficients given by the generalised Cauchy Integral Formula: 


f(z) 1 $ f(z) d 
r ( 


jl Ont Ip (z— 2t ”? 
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where I is any positively oriented loop inside the disk of convergence of the 
power series which contains the point zp in its interior. Because the power 
series converges uniformly, we can now substitute the power series for f(z) 
inside the integral and compute the integral termwise: 


f (zo) 1 $ f(z) 
r ( 


j ri z — zo) tt 
= (z — zo) 
-Yafa (z — zo) Eae 
— z) it g 
-A ‘aa fl z — zo) Bs 


But now, from the generalised Cauchy Integral Formula, 


. Ba TE 
fe =z)? dz = i bE. i a (2.52) 
r 


0 otherwise. 


Therefore, only one term contributes to the }>,, namely the term with k = j, 
and hence we see that 
fO zo) 


j! 
In other words, the power series is the Taylor series. Said differently, any 
power series is the Taylor series of a function analytic in the disk of conver- 
gence |z — zo| < R. 

For example, let us compute the Taylor series of the function 
1 
@-DE-%) 

in the disk |z| < 1. This is the largest disk centred at the origin where we 
could hope to find a convergent power series for this function, since it has 
singularities at z = 1 and z = 2. The naive solution to this problem would 
be to take derivatives and evaluate them at the origin and build the Taylor 
series this way. However from our discussion above, it is enough to exhibit 
any power series which converges to this function in the specified region. We 
use partial fractions to rewrite the function as a sum of simple fractions: 


1 =, 1l 1 
C=DG=49) l=e 2=z` 


Now we use geometric series for each of them. For the first fraction we have 


= aj. 


1 Tans 
= Io valid for |z| < 1; 
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whereas for the second fraction we have 


—1 —1/2 law L zi 
2=z T= (Py y ~~ Lp 


which is valid for |z| < 2, which contains the region of interest. Therefore, 
putting the two series together, 


1 = 1 
ra 7 (t-ae)*- for |z| < 1. 


j=0 


2.3.4 Laurent series 


In the previous section we saw that any function which is analytic in some 
neighbourhood of a point zo can be approximated by a power series (its Taylor 
series) about that point. How about a function which has a “mild” singularity 
at zo? For example, how about a function of the form g(z)/(z — zo)? Might 
we not expect to be able to approximate it by some sort of power series? 
It certainly could not be a power series of the type we have been discussing 
because these series are analytic at zp. There is, however, a simple yet useful 
generalisation of the notion of power series which can handle these cases. 
These series are known as Laurent series and consist of a sum of two power 
series. 

A Laurent series about the point zp) is a sum of two power series one 
consisting of positive powers of z — zp and the other of negative powers: 


Laurent series are often abbreviated as 


OO 


we aj(z — 2), 


j=- 


but we should keep in mind that this is only an abbreviation: conceptually 
a Laurent series is the sum of two independent power series. 

A Laurent series is said to converge if each of the power series converges. 
The first series, being a power series in Z — zg converges inside some circle of 
convergence |z — zo| = R, for some 0 < R < co. The second series, however, 
is a power series in w = 1/(z — zo). Hence it will converge inside a circle of 
convergence |w| = FR’; that is, for |w| < R’. If we let R’ = 1/r, then this 
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condition translates into |z — zo| > r. In other words, such a Laurent series 
will converge in an annulus: r < |z — zo| < R. (Of course for this to make 
sense, we need r < R. If this is not the case, then the Laurent series does 
not converge anywhere.) 

It turns out that the results which are valid for Taylor series have gener- 
alisations for Laurent series. The first main result that we will prove is that 
any function analytic in an open annulus r < |z — zo| < R centred at z has a 
Laurent series around z) which converges to it everywhere inside the annulus 
and uniformly on closed sub-annuli r < Rı < |z — zo| < Rə < R. Moreover 
the coefficients of the Laurent series are given by 


1 
ma $ Ui i for j = 0,+1,+2,..., 
r( 


Qri Jp (2 — z) it 


where I is any positively oriented loop lying in the annulus and containing 
zo in its interior. 

Notice that this result generalises the result proven in Section for 
functions analytic in the disk. Indeed, if f(z) were analytic in |z — zo| < 
R, then by the Cauchy Integral Theorem and the above formula for a,, it 
would follow that that a_; = 0 for 7 = 1,2,..., and hence that the Laurent 
series is the Taylor series. Notice also that the Laurent series is a nontrivial 
generalisation of the Taylor series in that the coefficients a_; for 7 = 1,2,... 
are not just simply derivatives of the function, but rather require contour 
integration. 


Figure 2.9: Contours I, Ty and To. 


In order to follow the logic of the proof, it will be convenient to keep Figure 
[2.9|in mind. The left-hand picture shows the annuli r < Ry < |z — zo| < 
Rə < R and the contour I. The right-hand picture shows the equivalent 
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contours I; and I>, circles with radii pı and p2 satisfying the inequalities 
r< pı < Rı and Ro < po < R. 

Consider the closed contour C’, starting and ending at the point P in the 
Figure, and defined as follows: follow [2 all the way around until P again, 
then go to Q via the ‘bridge’ between the two circles, then all the way along 
[Iı until Q, then back to P along the ‘bridge.’ This contour encircles the 
point z once in the positive sense, hence by the Cauchy Integral Formula we 


have that i 
O= f FO ac. 


2ri Jo- z 
On the other hand, because the ‘bridge’ is traversed twice in opposite direc- 
tions, their contribution to the integral cancels and we are left with 


1 f 10,1 f £0 
Osang Pears fg Pu. 


We now treat each integral at a time. 

The integral along I) can be treated mutatis mutandis as we did the 
similar integral in the proof of the Taylor series theorem in Section[2.3.2] We 
simply quote the result: 


ES oO 
da = 


where 


o 1 FCC) _ f(z) 
a; = ‘ Co (2.53) 


27 2 


Moreover the series converges uniformly in the closed disk |z — zo| < Ra, as 
was shown in that section. 

The integral along I’; can be treated along similar lines, except that 
because |z — zo| > |¢ — zol, we must expand the integrand differently. We 
will be brief, since the idea is very much the same as what was done for the 
Taylor series. We start by rewriting 1/(¢ — z) appropriately: 


1 1 1 1 


=z (Ç= z) (z= 20) z — zo 1— 2 ` 


z— zo 


Let us write this now as a geometric series: 


whence 


27 i C-z 


4 iG 
aj = $ G Fa ae , (2.54) 


(= sf [Oa 


Oni €—2 (z — 2)™41 
Now, for ¢ in T; we have that |C — zo| = pı and from the triangle inequality 
2.36), that |¢ — z| > Ry — pı. We also note that |z — zo| > Ri. Furthermore, 


f(¢), being continuous, is bounded so that |f(¢)| < M for some M and all ¢ 
on I1. Therefore using (2.28) and the above inequalities, 


KOE Mo (ay 
ý ~ Rip A 


1 $ f(Q) dé = > ail = 20) + S,(z) , 


where 


and where 


which is independent of z and, because pı < R1, can be made arbitrarily small 
by choosing n large. Hence S,(z) — 0 as n — co uniformly in |z — zo| > Ri, 
and 


E E O 
ani fr, C2 Lee aay 


where the a_j are still given by . In other words, 
1 FO) = ay 
gf J) gr = 5 a_j(z — j 

mi E 6 a" ita a l 


and the series converges uniformly to the integral for |z — zo| < Rı. In 
summary, we have that proven that f(z) is approximated by the Laurent 
series 


fl) = D ye-a, 
j=—-0o 
everywhere on r < |z — zo| < R and uniformly on any closed sub-annulus, 
where the coefficients a; are given by (2.53) for j > 0 and by (2.54) for j < 0. 
We are almost done, except that in the statement of the theorem the 
coefficients a; are given by contour integrals along l and what we have shown 
is that they are given by contour integrals along [lı or Fə. But notice that 
the integrand in is analytic in the domain bounded by the contours T 
and Ts; and similarly for the integrand in (2.54) in the region bounded by 
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the contours I and I1. Therefore we can deform the contours I’; and Ty to 
-T and I respectively, in the integrals 


o 1 f(Q) 
7 he ne ah ee 
i FO 
e o mf Ce 


which proves the theorem. 


Laurent series are unique 


We saw in Section B.3.3]that any power series is the Taylor series of the ana- 
lytic function it converges to. In other words, the power series representation 
of an analytic function is unique (in the domain of convergence of the series, 
of course). Since Laurent series are generalisations of the Taylor series and 
agree with them when the function is analytic not just in the annulus but in 
fact in the whole disk, we might expect that the same is true and that the 
Laurent series representation of a function analytic in an annulus should also 
be unique. This turns out to be true and the proof follows basically from 
that of the uniqueness of the power series. 
More precisely, one has the following result. Let 


co 


cj(z — 20) and 2 c_3(z — zo) Í 


j=0 


be any two power series converging in |z — zo| < R and |z — zo| > r, respec- 
tively, with R > r. Then there is a function f(z) analytic in the annulus 
r<|z—2| < R, such that 


(oe) 
J cj(z — zo) >? C_3(4— zo) 7 
j=0 


is its Laurent series. We shall omit the proof, except to notice that this 
follows from the uniqueness of the power series applied to each of the series 
in turn. 


© Do this in detail. 


This is a very useful result because it says that no matter how we obtain 
the power series, their sum is guaranteed to be the Laurent series of the 
analytic function in question. Let us illustrate this in order to compute the 
Laurent series of some functions. 
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For example, let us compute the Laurent series of the rational function 
(z?-2z+3)(z—2) in the region |z—1| > 1. Let us first rewrite the numerator 
as a power series in (z — 1): 


2—2z4+3=(z-1)?+2. 


Now we do the same with the denominator: 
1 1 1 1 


P29 ~ GSS) ~ 221 ae 


z—1 


where we have already left it in a form which suggests that we try a geometric 
series in 1/(z— 1), which converges in the specified region |z—1| > 1. Indeed, 
we have that in this region, 


1 1 1 RE 1 = 1 
z-l1-, <z-1 Ao Ae 


Putting the two series together, 


z2 —2z+3 7 z 1 
ae ae cmc 
Fete 3 
= T ea 


By the uniqueness of the Laurent series, this is the Laurent series for the 
function in the specified region. 

As a final example, consider the function 1/(z — 
1)(z — 2). Let us find its Laurent expansions in the a 
regions: |z| < 1, 1 < |z| < 2 and |z| > 2, which we I 
have labelled I, II and III in the figure. We start by 
decomposing the function into partial fractions: 


1 1 1 


a= ae ae ak. 


In region I, we have the following geometric series: 


1 1 < 
= = =X z? valid for |z| < 1; and 
zal =z E; 
e o (=) =5 J? valid for |z| < 2. 
j=0 j=0 


Therefore in their common region of convergence, namely region I, we have 


that : a i 
maal 


=0 


In region II, the first of the geometric series above is not valid, but the 
second one is. Because in region II, |z| > 1, this means that |1/z| < 1, 
whence we should try and use a geometric series in 1/z. This is easy: 


1 1 1 Tee og = 
-= =F 7 LG) = aa valid for |z| > 1. 


j=0 j=0 


Therefore in region II we have that 


1 “-l wl i, 
— ~ á — l J 
(z—D(z—2) De itl De i+ I 


j=0 j=0 


Finally in region II, we have that |z| > 2, so that we will have to find 
another series converging to 1/(z—2) in this region. Again, since now |2/z| < 
1 we should try to use a geometric series in 2/z. This is once again easy: 


a ee IANO SG 2 
5 -Fro 72 (3) = al valid for |z| > 2. 
= 


j=0 


Therefore in region III we have that 


CO 1 
pee J 
TENE RAA, -1+ 2) zitl ` 
J= 


Again by the uniqueness of the Laurent series, we know that these are the 
Laurent series for the function in the specified regions. 


2.3.5 Zeros and Singularities 


As a consequence of the existence of power and Laurent series representations 
for analytic functions we are able to characterise the possible singularities 
that an analytic function can have, and this is the purpose of this section. 
A point zo is said to be a singularity for a function f(z), if f ceases to 
be analytic at zp. Singularities can come in two types. One says that a a 
point 2 is an isolated singularity for a function f(z), if f is analytic in some 
punctured disk around the singularity; that is, in 0 < |z — zo| < R for some 
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R > 0. We have of course already encountered isolated singularities; e.g., 
the function 1/(z — zo) has an isolated singularity at zo. In fact, we will 
see below that the singularities of a rational function are always isolated. 
Singularities need not be isolated, of course. For example, any point —2 in 
the non-positive real axis is a singularity for the principal branch Log z of the 
logarithm function which is not isolated, since any disk around —2, however 
small, will contain other singularities. In this section we will concentrate 
on isolated singularities. We will see that there are three types of isolated 
singularities, distinguished by the behaviour of the function as it approaches 
the singularity. Before doing so we will discuss the singularities of rational 
functions. As these occur at the zeros of the denominators, we will start by 
discussing zeros. 


Zeros of analytic functions 


Let f(z) be analytic in a neighbourhood of a point zọ. This means that there 
is an open disk |z — zo| < R in which f is analytic. We say that zo is a zero 
of f if f(zo) = 0. More precisely we say that zo is a zero of order m, for 
m =1,2,..., if 


faj=f @) =f") aa f HO. bat Fo) ZO. 


(A zero of order m = 1 is often called a simple zero.) Because f(z) is 
analytic in the disk |z — zo| < R, it has a power series representation there: 
namely the Taylor series: 


© D(z 
=o Vea 


j=0 


But because zo is a zero of order m, the first m terms in the Taylor series 
vanish, whence 


© F(z , 
fe) = E  (e - a) = (e = a"), 


j=m 
where g(z) has a power series representation 
= Í G F Zp) j 
g(z) = X ——* (z a) 


2. +m) 


in the disk, whence it is analytic there and moreover, by hypothesis, g(zo) = 
f(z) /m! Æ 0. It follows from this that the zeros of an analytic function 
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are isolated. Because g(z) is analytic, and hence continuous, in the disk 
|z — zo| < R and g(zo) Æ 0, it means that there is a disk |z — zo| < € < Rin 
which g(z) #0, and hence neither is f(z) = (z — z0)™ g(z) zero there. 

Now let P(z)/Q(z) be a rational function. Its singularities will be the 
zeroes of Q(z) and we have just seen that these are isolated, whence the 
singularities of a rational function are isolated. 


Isolated singularities 


Now let zo be an isolated singularity for a function f(z). This means that f 
is analytic in some punctured disk 0 < |z — zo| < R, for some R > 0. The 
punctured disk is a degenerate case of an open annulus r < |z — zo| < R, 
corresponding to r = 0. By the results of the previous section, we know that 
f(z) has a Laurent series representation there. We can distinguish three 
types of singularities depending on the Laurent expansion: 


oO 


H= D ye-a. 


j=—00 


Let us pay close attention to the negative powers in the Laurent expansion: 
we can either have no negative powers—that is, a; = 0 for all j < 0; a 
finite number of negative powers—that is, a; = 0 for all but a finite number 
of j < 0; or an infinite number of negative powers—that is, a; # 0 for an 
infinite number of 7 < 0. This trichotomy underlies the following definitions: 


e We say that zp is a removable singularity of f, if the Laurent expansion 
of f around zg has no negative powers; that is, 


flz) = Doa(e— a). 


e We say that zo is a pole of order m for f, if the Laurent expansion of 
f around z has a; for all 7 < —m and a-m # 0; that is powers; that 
is, 

a-m 


Os eee H- + ao + alz — 2%) +: with a_m # 0. 


A pole of order m = 1 is often called a simple pole. 


e Finally we say that zo is an essential singularity of f if the Laurent 
expansion of f around 2g has an infinite number of nonzero terms with 
negative powers of (z — Zo). 
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The different types of isolated singularities can be characterised by the 
way the function behaves in the neighbourhood of the singularity. For a 
removable singularity the function is clearly bounded as z — Zo, since the 
power series representation 


f(z) = X a;(2 — %) = ao + a1(z — %) + °° 


j=0 


certainly has a well-defined limit as z — zg: namely, ag. This is not the same 
thing as saying that f(zo) = ao. If this were the case, then the function would 
not have a singularity at zo, but it would be analytic there as well. Therefore, 
removable singularities are due to f being incorrectly or “peculiarly” defined 
at 2. For example, consider the following bizarre-looking function: 


e” for z 40; 
wt at 2 =0. 


This function is clearly analytic in the punctured plane |z| > 0, since it agrees 
with the exponential function there, which is an entire function. This means 
that in the punctured plane, f(z) has a power series representation which 
agrees with the Taylor series of the exponential function: 


oO 


ORDER 


m 


However this series has the limit 1 as z — 0, which is the value of the 
exponential for z = 0, and this does not agree with the value of f there. 
Hence the function has a singularity, but one which is easy to cure: we simply 
redefine f at the origin so that f(z) = exp(z) throughout the complex plane. 
Other examples of removable singularities are 


sin Z 1 2 2 Ze Qt 
: =2(e-Gta- 1-H Go (2.55) 
and Jaj i 
D 
zal ze (2-1) +2(2- 1)) = (z- 1) +2. 


Of course in this last example we could have simply noticed that z2? — 1 = 
(z — 1)(z + 1) and simplified the rational function to z + 1 = (z — 1) + 2. 
In summary, at a removable singularity the function is bounded and can be 
redefined at the singularity so that the new function is analytic there, in 
effect removing the singularity. 


154 


In contrast, a pole is a true singularity for the function f. Indeed, around 

a pole zp of order m, the Laurent series for f looks like 

1 

FG) = he) 
(z — 2)™ 
where h(z) has a series expansion around zp given by 
h(z) = DA Qj—-m(Z — 20)! = Gm + a-m (2 — 20) + 
j=0 

This means that h(z) has at most a removable singularity at zọ. We have 
already seen many examples of functions with poles throughout these lec- 
tures, so we will not give more examples. Let us however pause to discuss 
the singularities of a rational function. 

Let f(z) = P(z)/Q(z) be a rational function. Then we claim that f(z) 
has either a pole or a removable singularities at the zeros of Q(z). Let us be 
a little bit more precise. Suppose that zo is a zero of Q(zo), and assume that 
it is a zero of order m. This means that 


Q(z) = (2 = 20)" a(z) , 


where q(z) is an analytic function around zo and such that q(zo) Æ 0. If zo is 
not a zero of P(z), then zo is a pole of f of order m. If zo is a zero of order 
k of P(z), then we have that 


P(z) = (z— zo)" p(z) , 


where p(z) is analytic and p(zo) 4 0. Therefore we have that 


(z—2o)ma(z) (2 — 20)™-* g(z) ’ 
whence f(z) has a pole of order m—k if m > k and has a removable singularity 
otherwise. 

How about essential singularities? A result known as Picard’s Theorem 
says that a function takes all possible values (with the possible exception of 
one) in any neighbourhood of an essential singularity. This is a deep result in 
complex analysis and one we will not even attempt to prove. Let us however 
verify this for the function f(z) = exp(1/z). This function is analytic in 
the punctured plane |z| > 0 since the exponential function is entire. For 
any finite w we have seen that the exponential function has a power series 
expansion: 


Therefore for |z| > 0, we have that 


1 1 
1/z _ Oey ee 
ç = Dg? 


j=0 


whence zp) = 0 is an essential singularity. According to Picard’s theorem, 
the function exp(1/z) takes every possible value (except possibly one) in any 
neighbourhood of the origin. Clearly, the value 0 is never attainable, but we 
can easily check that any other value is obtained. Let c 4 0 be any nonzero 
complex number, and let us solve for those z such that exp(1/z) = c. The 
multiple-valuedness of the logarithm says that there are infinitely many such 
z, satisfying: 


1 
— = log(c) = Log |c| + i Arg(c) + 277k , 
ž 

for k = 0, +1 +2,..., whose moduli are given by 


Log |c| — i Arg(c) — 2r i k 


a= (Log |c|)? + (Arg(c) + 27 k)? ’ 


which can be as small as desired by taking k as large as necessary. Therefore 
in any neighbourhood of the origin, there are an infinite number of points for 
which the function exp(1/z) takes as value a given nonzero complex number. 


2.4 The residue calculus and its applications 


We now start the final section of this part of the course. It is the culmination 
of a lot of hard work and formalism but one which is worth the effort and 
the time spent developing the necessary vocabulary. In this section we will 
study the theory of residues. The theory itself is very simple and is basically 
a matter of applying what we have learned already in the appropriate way. 
Most of the sections are applications of the theory to the computation of real 
integrals and infinite sums. These are problems which are simple to state in 
the context of real calculus but whose solutions (at least the elementary ones) 
take us to the complex plane. In a sense they provide the simplest instance of 
a celebrated phrase by the French mathematician Hadamard, who said that 
the shortest path between two real truths often passes by a complex domain. 


2.4.1 The Cauchy Residue Theorem 


Let us consider the behaviour of an analytic function around an isolated 
singularity. To be precise let z be an isolated singularity for an analytic 
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function f(z). The function is analytic in some punctured disk 0 < |z — zo| < 
R, for some R > 0, and has a Laurent series there of the form 


oO 


H= D ye-a. 


j=—00 


Consider the contour integral of the function f(z) along a positively oriented 
loop I contained in the punctured disk and having the singularity zo in its 
interior. Because the Laurent series converges uniformly, we can integrate 
the series term by term: 


fode- S ay f(e- ao) dz. 


j=—00 


From the (generalised) Cauchy Integral Formula or simply by deforming the 
contour to a circle of radius p < R, we have that (c.f., equation (2.52))) 


/ , 27% for j= —1, and 
z — z0} dz = 
A o) i 


0 otherwise; 


whence only the j = —1 term contributes to the sum, so that 


fro dz = 2ria. 
r 


This singles out the coefficient a—ı in the Laurent series, and hence we give 
it a special name. We say that a_, is the residue of f at zọ, and we write 
this as Res(f; zo) or simply as Res(z)) when f is understood. 

For example, consider the function zexp(1/z). This function has an es- 
sential singularity at the origin and is analytic everywhere else. The residue 
can be computed from the Laurent series: 


whence the residue is given by Res(0) = 5. 

It is often not necessary to calculate the Laurent expansion in order to 
extract the residue of a function at a singularity. For example, the residue 
of a function at a removable singularity vanishes, since there are no negative 
powers in the Laurent expansion. On the other hand, if the singularity 
happens to be a pole, we will see that the residue can be computed by 


differentiation. 
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Suppose, for simplicity, that f(z) has a simple pole at z. Then the 
Laurent series of f(z) around zo has the form 


a 
fz) = — + a9 + ai(z— 2) ++", 
Z — 20 


whence the residue can be computed by 
Res( f; zo) = lim (z — zo) f(z) 
z= zo 


= lim (a + ag(z = zo) -- a(z = 27 4+.. -) 


220 


aı +0. 


For example, the function f(z) = e7/z(z + 1) has simple poles at z = 0 and 
z = —1; therefore, 


Suppose that f(z) = P(z)/Q(z) where P and Q are analytic at zọ and Q 
has a simple zero at zọ whereas P(zo) # 0. Clearly f has a simple pole at Zo, 
whence the residue is given by 


P P a 
Res(f; 20) = jim (2 = “aC a iy coe Z a , 


where we have used that Q(z0) = 0 and the definition of the derivative, which 
exists since Q is analytic at zo. 

We can use this to compute the residues at each singularity of the function 
f(z) = cotz. Since cot z = cos z/ sin z, the singularities occur at the zeros of 
the sine function: z = nz, n = 0,+1,+2,.... These zeros are simple because 
sin'(nr) = cos(nm) = (—1)" #40. Therefore we can apply the above formula 
to deduce that 

COS z cos(n7) 


Bes eae co ee 


Z=ÆENT 


This result will be crucial for the applications concerning infinite series later 
on in this section. 
Now suppose that f has a pole of order m at zp. The Laurent expansion 
is then 
A-m (al 


= (z — z0)” ee z — % pada a 
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Let us multiply this by (z — zo)™ to obtain 
(z — 20)” f(z) = Gm +++» +@_1(z — 2)"* + aa(z — 20)" +++ , 
whence taking m — 1 derivatives, we have 


T e-a) fe] = m- hay + mbagl 2 — 29) +0 


Finally if we evaluate this at z = zo, we obtain (m — 1)!a—1, which then gives 
a formula for the residue of f at a pole of order m: 


i eae 


Res(f; zo) = lim ay gm e 0)" Fe) (2.56) 


Zz zo 


For example, let us compute the residues of the function 


COS z 
2(z—7)3 ` 


fla= 


This function has a pole of order 2 at the origin and a pole of order 3 at 
z =T. Therefore, applying the above formula, we find 


= tin | 3cosz | 
z0|(z-—m)> (z-7)4 
_ 3 
Tt’ 
cs, 1 æ 7 
Res(f; 7) = lim 572 (z= =) f(z)] 
2 iml E = 
zon 2 dz? L 2? 
= 1 | 6cosz 4sinz  cosz 
=tim | va g z2 | 
6 — 1? 
Ont 


We are now ready to state the main result of this section, which con- 
cerns the formula for the integral of a function f(z) which is analytic on a 
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positively-oriented loop [ and has only a finite number of isolated singulari- 
ties {z,} in the interior of the loop. Because of the analyticity of the function, 
and using a contour deformation argument, we can express the integral of 
f(z) along I as the sum of the integrals of f(z) along positively-oriented 
loops I’, each one encircling one of the isolated singularities. But we have 
just seen that the integral along each of these loops is given by 2r i times the 
residue of the function at the singularity. In other words, we have 


$s) is oe f(e) dz = Yo 2mi Res( fi) . 


In other words, we arrive at the Cauchy Residue Theorem, which states that 
the integral of f(z) along I is equal to 277 times the sum of the residues of 
the singularities in the interior of the contour: 


§ fle)de = ani ` Res( f; 2%) - 


singularities 
zkElntT 


For example, let us compute the integral 


1— 2z 
a 


along the positively oriented circle of radius 2: |z| = 2. The integrand f(z) 
has simple poles at z = 0, z = 1 and z = 2, but only the first two lie in the 
interior of the contour. Thus by the residue theorem, 


1— 2z , l a 
cence dg ana RNs Ost Beslis 


and 


Res(f;0) = lim = f(2) 


suites (1 — 22) 
230 (z — 1)(z —3) 

1 

Sa 

Res(/; 1) = lim(2 — 1) f(2) 

7 (1 — 2z) 
z>1 z(z — 3) 
1 . 

ao 


so that 


f Ve a a es 
da= Ce A a a 


© © Notice something curious. Computing the residue at z = 3, we find, 


Res( f; 3) = lim (z — 3) f(2) 


(1 — 22) 
= lim 
23 z(z — 1) 
Ten -ő . 
=] 


whence the sum of all three residues is 0. This can be explained by introducing the 
Riemann sphere model for the extended complex plane, and thus noticing that a contour 
which would encompass all three singularities can be deformed to surround the point at 
infinity but in the opposite sense. Since the integrand is analytic at infinity, the Cauchy 
Integral Theorem says that the integral is zero, but (up to factors) this is equal to the 
sum of the residues. 


2.4.2 Application: trigonometric integrals 


The first of the applications of the residue theorem is to the computation of 
trigonometric integrals of the form 


2T 
| R(cos 0, sin 0) dé , 
0 


where R is a rational function of its arguments and such that it is finite in the 
range 0 < 0 < 27. We want to turn this into a complex contour integral so 
that we can apply the residue theorem. One way to do this is the following. 
Consider the contour I parametrised by z = exp(i@) for 0 € [0,27]: this is 
the unit circle traversed once in the positive sense. On this contour, we have 
z = cos 0 + i sin and 1/z = cos@ — i sin@. Therefore we can solve for cos 0 
and sin@ in terms of z and 1/z as follows: 


aso ge and oy ee ged 
2 z 2i z 


Similarly, dz = dexp(i0) = iz d0, whence d0 = z, Putting it all together we 


have that 
2m 1 + 1 aak 
f R(cos0,sin0) do = f - R(Ż z Ž =) dz, 
0 Tle 2 21 
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which is the contour integral of a rational function of z, and hence can be 
computed using the residue theorem: 


20 
f R(cos 0, sin 0) d0 = 27 > Res(f; 2x) , (2.57) 
0 singularities 
feel 


where f(z) is the rational function 


hfe 2S, 
f(z) = r( E =) (2.58) 
As an example, let us compute the integral 
T= | m (sinb)? ag 
o 5+4cos0 


First of all notice that the denominator never vanishes, so that we can go 
ahead. The rational function f(z) given in (2.58) is 


a-t ae- 1 Z- __1 Z- 
= z5+4}(z+ł4)  42(222+5z+2)  8z2(z+4)(z+2 
whence it has a double pole at z = 0 and single poles at z = —; and z = —2. 
Of these, only the poles at z = 0 and z = -i lie inside the unit disk, whence 


I = 2n [Res(f;0) + Res(f;—4)] . 


Let us compute the residues. The singularity at z = 0 is a pole of order 2, 
whence by equation (2.56), we have 


non d 1 (2-1? 5 
PARUS ener ~ 16° 


The pole at z = —$ is simple, so that its residue is even simpler to compute: 


As a mild check on our result, we notice that it is real whence it is not 
obviously wrong. 
Let us do another example: 


T d8 
ie 
9 2— cos 
This time the integral is only over [0,7], so that we cannot immediately use 


the residue theorem. However in this case we notice that because cos(2r — 
0) = cos@, we have that 


ia do -f d(2r — 0) =- f do -f do 
„ 2-cosé J, 2—cos(2r—0) Jf, 2—cos? Jy 2—cos0 ` 


Therefore, 
j _ r [ d8 
7 2 — cosô ’ 
0 


which using equation (2.57) and paying close attention to the factor of z, 
becomes 7 times the sum of the residues of the function 
a 1 

z22—3(z+2) 


f(z) 


lying inside the unit disk. After a little bit of algebra, we find that 


fe 2 
z2 —4z+1 (z= 2+ v3(z -2 - v3) 


Iess 


Of the two simple poles of this function only the one at z = 2— v3 lies inside 
the unit disk, hence 


—2 1 
Res(f;2 — V3) = lim = j 
g ) pee 2- V3 4/3 


and thus the integral becomes 


a 


2.4.3 Application: improper integrals 


In this section we consider improper integrals of rational functions and of 
products of rational and trigonometric functions. 
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Let f(x) be a function of a real Fn) which is continuous in 0 < x < 
oo. Then by the improper integral hT x) dx, we mean the limit 


EOLIE lim ere 
0 R> Jo 


if such a limit exists. Similarly, if f(x) is continuous in —oo < x < 0, then 
the improper integral Se f(x) dx is defined by the limit 


f_t dx = lim KOLI 


r—— oo 


again provided that it exists. If f(a) is continuous on the whole real line and 
both of the above limits exists, we define 


T f(x)dz = jim * Fe) dz . (2.59) 


pS 


If such limits exist, then we get the same result by symmetric integration: 


D f(x)dx = lim RG ) dz (2.60) 


pro 


Notice however that the symmetric integral may exist even if the improper 
integral (2.59) does not. For example consider the function f(x) = «x. Clearly 
the integrals je pun and f? zdr do not exist, yet because x is an odd 
function, f g . xdx = 0 for all p, whence the limit is 0. In cases like this we 
say that equation defines the Cauchy principal value of the integral, 
and we denote this by 


pv f f(x)dx = lim ‘ f(x)da . 


P> Jp 


We stress to point out that whenever the improper integral (2.59) exists it 
agrees with its principal value (2.60). 


Improper integrals of rational functions over (—oco, co) 


Let us consider as an example the improper integral 
I f % dr j / P dg 
= p.v. ———— = lim ———. 
P-Y j or? +4 pro la +4 
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The integral for finite p can be interpreted as the complex integral of the 
function f(z) = 1/(2? + 4), 
dz 
Ye z? + 4 l 

where y, is the straight line segment on the real axis: y = 0 and —p <x < p. 
In order to use the residue theorem we need to close the contour; that is, we 
must produce a closed contour along which we can apply the residue theorem. 
Ot f course, in so doing we are introducing a further integral, and the success 
of the method depends on whether the extra integral is computable. We will 
see that in this case, the extra integral, if chosen judiciously, vanishes. 

Let us therefore complete the contour y, to a closed 
contour. One suggestion is to consider the semicircular 
contour C7 in the upper half plane, parametrised by 
z(t) = pexp(it), for t € [0,7]. Let T, be the composi- =p Y P 
tion of both contours: it is a closed contour as shown 
in the figure. Then, according to the residue theorem, 


r a aa — _ =ri Resif a: 
E Eue mi J, Res(fs 24) 5 


p singularities 
zkElnt Tp 


Je dz 
[im D a a 


singularities P 
zpEInt Tp 


Cp 


whence 


We will now argue that the integral along C7 vanishes in the limit p — oo. 
Of course, this is done using (2.28): 


f ue < | LE (2.61) 
gers ot |2? +4 


Using the triangle inequality (2.36), we have that on CF, 


lo? +4) > |2°)-4= |e? -4=p?-4, 


whence 
1 1 
< ; 
eae pas 
Plugging this into (2.61), and taking into account that the length of the 
semicircle C} is mp, 


f dz 
cr z2 +4 


Tp 
pP —4 


< — 0 as p — OO. 
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Therefore in the limit, 


dz 
[ 244 = 271 ` Res(f; Zk) $ 


singularities 

zp€lnt Tp 
The function f(z) has poles at z = +2i, of which only the one at z = 2i lies 
inside the closed contour I’,, for large p. Computing the residue there, we 
find from (2.56) that 


Res( f; 2i) = lim | 


22% 


o1 
z+2i] 4i’ 
and hence the integral is given by 


1 T 
| ny E K N 
TH a? 


There is no reason why we chose to close the contour using the top semicircle CF instead 
of using the bottom semicircle Cp; parametrised by z(t) = pexp(it) for t € [r, 27]. The 
same argument shows that in the limit p — oo the integral along Cp vanishes. It is now 
the pole at —2i that we have to take into account, and one has that Res(f; —2i) = —1/4i. 
Notice however that the closed contour is negatively-oriented, which produces an extra — 
sign from the residue formula, in such a way that the final answer is again 


[T= —2ni = = a ! 

The technique employed in the calculation of the above integral can be 
applied in more general situations. All that we require is for the integral 
along the large semicircle C$ to vanish and this translates into a condition 
on the behaviour of the integrand for large |z]. 

We will now show the following general result. Let R(x) = P(x)/Q(x) 


be a rational function of a real variable satisfying the following two criteria: 
e Q(x) # 0; and 
e deg Q — deg P > 2. 


Then the improper integral of R(x) along the real line is given by considering 
the residues of the complex rational function R(z) at its singularities in the 
upper half-plane. Being a rational function the only singularities are either 
removable or poles, and only these latter ones contribute to the residue. In 
summary, 


pv f R(x)dz = 2ri ` Res(R; zx) . (2.62) 


29 poles zz 


Im(zķ)>0 
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The proof of this relation follows the same steps as in the computation 
of the integral J above. The trick is to close the contour using the upper 
semicircle C$ and then argue that the integral along the semicircle vanishes. 
This is guaranteed by the the behaviour of R(z) for large |z]. 


© Let us do this in detail. The integral to be computed is 


oo p 
I=p. vf R(x) dx = lim J R(x)dx = lim / R(z) dz. 
aes poo J_, J 


poo 
p 
Closing the contour with cr to Ip, we have 


R(z) dz = a R(z) dz — T R(z) dz. 


Ye 


The first integral in the right-hand side can be easily dispatched using the residue theorem. 
In the limit p — ov, one finds 


pro 


lim $ R(z)dz = 2r i ` Res(R; zx) - 
Tp poles zk 
Im(z;,)>0 


All that remains then is to show that the second integral vanishes in the limit p —> co. 
We can estimate it using as usual: 


_f Ply 
[red < fi R= fo oj 


Let the degree of the polynomial P(z) be p and that of Q(z) be q, where by hypothesis 
we have that q — p > 2. Recall from our discussion in Section [2.2.6] that for large |z| a 
polynomial P(z) of degree N behaves like |P(z)| ~ c|z| for some c. Similar considerations 
in this case show that the rational function R(z) = P(z)/Q(z) with q = deg Q > deg P = p 
obeys 


C 
|z|¢-P 


|R(z)| < , 
for some constant c independent of |z|. Using this into the estimate of the integral along 


Cr, and using that the semicircle has length rp, 


R(z)dz < —.. 
OF piP 


Since q — p > 2, we have that this goes to zero in the limit p — oo, as desired. 


As an example, let us compute the following integral 


oo r? 
I= p.v. = dz. 
PY) T ” 


The integrand is rational and obeys the two criteria above: it is always finite 
and the degree of the denominator is 4 whereas that of the numerator is 2, 
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whence 4 — 2 > 2. In order to compute the integral it is enough to compute 
the residues of the rational function 


2 


IQ) = tape 


at the poles in the upper half-plane. This function has poles of order 2 at 
the points z = +7, of which only z = +7 is in the upper half-plane, hence 
from (2.62) we have 


2 2; P- 
Fe E E E E a E S 
zi dz |(z +i)? 


Improper integrals of rational and trigonometric functions 


The next type of integrals which can be be handled by the method of residues 
are of the kind 


oO 


pv. f R(x) cos(ax) dx and pv. f R(x) sin(ax) dz , 
where R(x) is a rational function which is continuous everywhere in the real 
line (except maybe at the zeros of cos(az) and sin(az), depending on the 
integral), and where a is a nonzero real number. 

As an example, consider the integral 


oo p 
l= pv. f souls) dz = lim f aca dx 


œ T? +4 poj p H4 0” 


From the discussion in the previous section, we are tempted to try to express 
the integral over |[—p, p] as a complex contour integral, close the contour and 
use the residue theorem. Notice however that we cannot use the function 
cos(3z)/(z? +4) because | cos(3z)| is not bounded for large values of | Im(z)]. 
Instead we notice that we can write the integral as the real part of a complex 
integral J = Re(Jp), where 
p oie 
Ío = eae : 


Therefore let us consider the integral 
p pi3r i3z 
o 
ap +4 p TA 
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where y, is the line segment on the real axis from —p to p. We would like to 
close this contour to be able to use the residue theorem, and in such a way 
that the integral vanishes on the extra segment that we must add to close it. 
Let us consider the upper semicircle one There we have that 


e'3% 


z2+A4 


—3 Im(z) —3Im(z) 


E E 
= < 
a pe 


ei 


where to reach the inequality we used (2.36) as was done above. The function 
e™3 mC) is bounded above by 1 in the upper half-plane, and in particular along 
C7, hence we have that on the semicircle, 


etz 


z2? +4 


2 1 
— pP — 4 i 


Therefore the integral along the semicircle is bounded above by 


i3z 
Ee 
ct # +4 p —4 


—>0 as p — OO. 


Therefore we can use the residue theorem to express Tọ in terms of the residues 
of the function f(z) = exp(i3z)/(z* + 4) at the poles in the upper half-plane. 
This function has simple poles at z = +27, but only z = 2i lies in the upper 
half-plane, whence 


. , EN er se T 
Ip = 2r i Res(f; 2i) = 27i lim PT = 291i = 58° 


which is already real. (One could have seen this because the imaginary part is 
the integral of sin(3x)/(x? +4) which is an odd function and hence integrates 
to zero under symmetric integration.) Therefore, 


T 
I = Re(Io) = 26 s 


Suppose instead that we had wanted to compute the integral 


co e383 
Lv. ——— dz. 
j vf a i 


Of course, now we could do it because this is the complex conjugate of the 
integral we have just computed, but let us assume that we had not yet done 
the other integral. We would follow the same steps as before, but notice that 
now, 


e738 


g2+a4 


e3 Im(z) 


DETE 
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which is no longer bounded in the upper half-plane. In this case we would be 
forced the close the contour using the lower semicircle C7, , keeping in mind 
that the closed contour is now negatively oriented. The lesson to learn from 
this is that there is some choice in how to close to contour and that one has 
to exercise this choice judiciously for the calculation to work out. 

This method of course generalises to compute integrals of the form 


pv. f R(x)e** dz (2.63) 
where a is real. Surprisingly the conditions on the rational function R(x) are 
now slightly weaker. Indeed, we have the following general result. 

Let R(x) = P(x)/Q(x) be a rational function satisfying the following 
conditions: 


e Q(x) # 0B and 
e deg Q — deg P > 1. 


Then the improper integral is given by considering the residues of the 
function f(z) = R(z)e’“ at its singularities in the upper (if a > 0) or lower 
(if a < 0) half-planes. These singularities are either removable or poles, and 
again only the poles contribute to the residues. In summary, 


a 2Ti X` poles z, Res(f; zk) if a > 0; 
p. vf R(x)e dz = a (2.64) 


o0 —2r i X` poles z, Res(f; z) ifa< 0. 


Im(zķ)<0 


This result is similar to with two important differences. The first is 
that we have to choose the contour appropriately depending on the integrand; 
that is, depending on the sign of a. The second one is that the condition on 
the rational function is less restrictive than before: now we simply demand 
that the degree of Q be greater than the degree of P. This will therefore 
require a more refined estimate of the integral along the semicircle, which 
goes by the name of the Jordan lemma, which states that 


P 
lim az (z) 
PS SOF. Q(z) 
whenever a > 0 and deg Q > deg P. Of course an analogous result holds for 
a < 0 and along C7. 


dz=0, 


3This could in principle be relaxed provided the zeros of Q at most gave rise to remov- 
able singularities in the integrand. 
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Let us prove this lemma. Parametrise the semicircle Cj by z(t) = pexp(it) for t € [0,7]. 


Then by (2.25) 
; it 
f etaz P(z) a= la eiape** ewe) pidt. 
c} = Q(z) o Q(pe**) 


Let us now estimate the integrand term by term. First we have that 


n it š -ië n 
ebape = e'ap(cos t+i sin t) =e apsint — 


Similarly, since deg Q — deg P > 1, we have that 


P(pe*) Le 
Q(pe*) ~ p 


for p large, for some c > 0. Now using (2.24) on the t-integral together with the above 
(in)equalities, 


r it 
f etaz P(z) dz-= 1 etape’ P(pet ) pidt < cf” ete sint dt. 
of ) 0 0 


Q(z Q(pe**) 


We need to show that this latter integral goes to zero in the limit p — oo. First of all 
notice that sin t = sin(r — t) for t € [0,7], whence 


T . n/2 . 
f e ap sint dt = 2 f e apsint dt . 
0 0 


Next notice that for t € [0,7/2], sint > 2t/z. This can be seen pictorially as in the 
following picture, which displays the function sin t in the range t € [—7, 7] and the function 
2t/r in the range t € [0, 7/2] and makes the inequality manifest. 


Therefore, 


n/2 g T r 
f e`erPsint q< f en 2apt/T d= = e7 2apT 
(0) ~ Jo 2ap 


Putting this all together, we see that 


f eae P(z) dz < er jE e7 24pm 
ct Q(z) ap 


which clearly goes to 0 in the limit p — oo, proving the lemma. 


As an example, let us compute the integral 


° esin x 
l=pv. ———dr. 
i a 7 
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This is the imaginary part of the integral 


oS pe 
l= p.v. — d 
pf e T, 


which satisfies the conditions which permit the use of with a = 1 and 
R(z) = z/(1+ 2°). This rational function has simple poles for z = +i, but 
only z = i lies in the upper half-plane. According to (2.64) then, and letting 
f(z) = R(z)e”, we have 


iz ‘ool ; 
fo = 27i Res(f;i) = 2m ilim |= |an L a Jh 
zat | Z+4 


Improper integrals of rational functions on (0, co) 


The next type of integrals which can be tackled using the residue theorem 
are integrals of rational functions but over the half line; that is, integrals of 


the form: N 
f Ria) de 
0 


where R(x) is continuous for x > 0. Of course, if R(x) were an even function, 
ie., R(—x) = R(x), then we would have f° R(x) dx = 3 f° R(«) dx, and 
we could use the method discussed previously. However for more general 
integrands, this does not work and we have to do something different. 

The following general result is true. Let R(x) = P(x)/Q(x) be a rational 
function of a real variable satisfying the following two conditions 


e Q(x) # 0; and 
e deg Q — deg P > 2. 


Further let f(z) = log(z) R(z) with the branch of the logarithm chosen to 
be analytic at the poles {z,} of R; for example, we can choose the branch 
Logo(z) which has the cut along the positive real axis, since Q(x) has no 
zeros there. Then, 


[ R(x) dx = — ` Res(f; zk), for f(z) = log(z) R(z). (2.65) 


© The details. 


poles zk 
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This same method can be applied to integrals of the form 


[ R(x)dz , 


where the rational function R(x) = P(x)/Q(x) satisfies the same conditions 
as above except that now Q(x) # 0 only for x > a. In this case we must 
consider the function f(z) = log(z — a) R(z). 


Details? 
Similarly, since ff = f — f7, we can use this method to compute 
indefinite integrals of rational functions. 


2.4.4 Application: improper integrals with poles 


Suppose that we want to compute the principal value integral 


eS sinr 
iepa] dz . 
= ip 


oe) 


This integral should converge: the singularity at x = 0 is removable, as we 
saw in equation (2.55), so that the integrand is continuous for all x, and the 
rational function 1/z satisfies the conditions of the Jordan Lemma. Following 
the ideas in the previous section, we would be write 

I = Im(Io) where Io = pv. f 7 dx , (2.66) 
and compute Jp. However notice that now the integrand of Jp has a pole at 
x = 0. Until now we have always assumed that integrands have no poles 
along the contour, so the methods developed until now are not immediately 
applicable to perform the above integral. We therefore need to make sense 
out of integrals whose integrands are not continuous everywhere in the region 
of integration. 

Let f(x) be a function of a real variable, which is continuous in the 
interval |a, b] except for a discontinuity at some point c, a < c < b. Then the 
improper integrals of f over the intervals |a,c], [c,b] and [a,b] are defined 
by 


f soars iim f Faa, 
b b 
f f(x)dz= lim A) dz , 
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and : 
f to f(x)dx = imf f(x)dx e f(x)dx (2.67) 
c+s 
provided the appropriate limit(s) exist. We have used the notation r N 0 to 
mean that r approaches 0 from above; that is, r > 0 as we take the limit. As 
an example, consider the function 1/./z integrated on [0, 1): 


Sain Boy lim z| = lim [2 - 2v5] = 2. 


If the limits in (2.67) exist, then we can calculate the integral using sym- 
metric integration, which defines the principal value of the integral, 


p.v. ff) de = iim TEOS io . 


However the principal value integral may exist even when the improper in- 
tegral does not. Take, for instance, 


£ dx 
p. V. = lim a f |S 
1 r™\0 2+r ~~ 


4 
=jine |e -a È - 2| 
an 0g |x | 1 a og |x | | 


= lim [Logr + Log 2 — Logr] = Log 2 , 


whereas it is clear that the improper integral f ka does not exist. 


When the function f(x) is continuous everywhere in the real line except 
at the point c we define the principal value integral by 


fore) c-r p 
pv j fle)de= jim m Oto f fle)aa| (2.68) 


provided the limits p — oo and r N 0 exist independently. In the case of 
several discontinuities {c;} we extend the definition of the improper integral 
in the obvious way: excising a small symmetric interval (c;—1;,c;+7;) about 
each discontinuity and then taking the limits r; N 0 and, if applicable, 
p— oo. 

It turns out that principal value integrals of this type can often be eval- 
uated using the residue theorem. The residue theorem applies to closed 
contours, so in computing a principal value integral we need to close the 
contour, not just p to —p as in the previous session, but also c—r to c+r. 
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Figure 2.10: Closing the contour around a singularity. 


One way to do this is to consider a small semicircle S, of radius r around the 
singular point c, as in Figure [2.10 
Because we are interested in the limit r N 0, we will have to consider the 


integral 
li dz. 
lim I Fe) z 


When the singularity of f(z) at z = c is a simple pole, this integral can be 
evaluated using the following result, which we state in some generality. 


Figure 2.11: A small circular arc. 


Let f(z) have a simple pole at z = c and let A, be the circular arc in 
Figure [2.11] parametrised by z(@) = c + r exp(i0) with 0) < 0 < 01. Then 


lim f. Marein- a: 


Therefore for the semicircle S, in Figure 2.10] we have 


lir f. f(z) dz = —ir Res(f;c) . (2.69) 


Let us prove this result. Since f(z) has a simple pole at c, its Laurent expansion in a 
punctured disk 0 < |z — c| < R has the form 


o0 
a— 


f(2) = =t an(e— 0), 
k=0 
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where 
co 
ge) = So a(z — 0) 
k=0 
defines an analytic function in the disk |z — c| < R. Now let 0 < r < R and consider the 


integral 
dz 
J O] +f g(z)dz. 
Ar Ap 7 CE Ar 


Because g(z) is analytic it is in particular bounded on some neighbourhood of c, so that 
|g(z)| < M for some M and all |z — c| < R. Then we can estimate its integral by using 
(2.28): 


fo oaz < f alde < MA) = Mr(61 = 60) , 
A A, 


r 


whence 


On the other hand, 
d Oi pist Oy 
J £ =f m ao =i f d8 = i (01 — 80) . 
Ap Z—€ 60 re’ 80 


im f f(e) dz = i (01 — 0o)a—1 +0 = i (01 — 00) Res(f; c) . 


Therefore 


Having discussed the basic theory, let us go back to the original problem: 
the computation of the integral Jọ given in (2.66): 


—r ir P piz 
Io = lim | “a+ f = ar ; 
poo = x m wb 


rN o 


which for finite p and nonzero r can be understood as a contour integral in 
the complex plane along the subset of the real axis consisting of the intervals 
|—p,—r] and |r, p]. In order to use the residue theorem we must close this 
contour. The Jordan lemma forces us to join p and —p via a large semicircle 
C} of radius p in the upper half-plane. In order to join —r and r we choose a 
small semicircle S, also in the upper half-plane. The resulting closed contour 
is depicted in Figure [2.12 

Because the function is analytic on and inside the contour, the Cauchy 
Integral Theorem says that the contour integral vanishes. Splitting this con- 
tour integral into its different pieces, we have that 


= 7 
Jatha h +h 
—p Sp r ons 
which remains true in the limits p — oo and r N 0. By the Jordan lemma, 
the integral along C7 vanishes in the limit p — oo, whence, using (2.69), 


el 
—<dz=0, 
Z 


et? et? 
Jo =~ him | —dz=lim — dz = im Res(0) = in , 
r\0 g, 2 r\0 P 
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—p =r r P 


Figure 2.12: The contour in the calculation of Jp in (2.66). 


since the residue of e’*/z at z = 0 is equal to 1. Therefore, we have that 


p. v a de = inion) =a. 
-œo T 


There are plenty of other integrals which can be calculated using the 
residue theorem; e.g., integrals involving multi-valued functions. We will not 
have time to discuss them all, but the lesson to take home from this cursory 
introduction to residue techniques is that when faced with a real integral, one 
should automatically think of this as a parametrisation of a contour integral 
in the complex plane, where we have at our disposal the powerful tools of 
complex analysis. 


2.4.5 Application: infinite series 


The final section of this part of the course is a beautiful application of the 
theory of residues to the computation of infinite sums. 
How can one use contour integration in order to calculate sums like the 


following one: 
1 
J 72 ? (2.70) 


The idea is to exhibit this sum as part of the right-hand side of the 
Cauchy Residue Theorem. For this we need a function F(z) which has only 
simple poles at the integers and whose residue is 1 there. We already met a 
function which has an infinite number of poles which are integrally spaced: 
the function cot z has simple poles for z = nz, n = 0, +1, +2,... with residues 
equal to 1. Therefore the function F(z) = mcot(mz) has simple poles at 
z =n, n an integer, and the residue is still 1: 

T cos(71z) . ™cos(1z) 


Res(F;n) = li = 
es( ) n) pane (sin(7z))! Zon TT cos(7z) 
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Now let R(z) = P(z)/Q(z) be any rational function such that deg Q — 
deg P > 2. Consider the function f(z) = m cot(7z)R(z) and let us integrate 
this along the contour I y, for N a positive integer, defined as the positively 
oriented square with vertices (N + $)(1+i), (N +4)(—1 +i), (N+$)(-1-4) 
and (N + $)(1—4), as shown in Figure 2.13] Notice that the contour misses 
the poles of mcot(7z). Assuming that N is taken to be large enough, and 
since R(z) has a finite number of poles, one can also guarantee that the 
contour will miss the poles of R(z). 


(N + $)(-1+4) (N + $)(1 +4) 
Cn 
N 
N+1 
(N + 4)(-1-4) (N + 4)(1—-4) 


Figure 2.13: The contour Iy. 


Let us compute the integral of the function f(z) along this contour, 


| _eot(ns)R(2) ds 


in two ways. On the one hand we can use the residue theorem to say that 
the integral will be (277) times the sum of the residues of the poles of f(z). 
These poles are of two types: the poles of R(z) and the poles of m cot(7z), 
which occur at the integers. Let us assume for simplicity that R(z) has no 
poles at integer values of z, so that the poles of R(z) and mcot(mz) do not 
coincide. Therefore we see that 


N 
f n cot(mz)R(z) dz = 27i a Res(f;n) + ` Res( f; zp) 
Tn n=—N poles y of R 
inside Ty 
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The residue of f(z) at z — n is easy to compute. Since by assumption R(z) 
is analytic there and 7r cot(7z) has a simple pole with residue 1, we see that 
around z = n, we have 


z=n Z =I 


f(z) = Re) 7 cot(r2) = RE) ( : te] = Be) + h(z), 
where h(z) is analytic at z = n. Therefore, 
Res(f;n) = lim [(z —n)(2)] = Rn) +0, 


and as a result, 


m cot(mz)R(z) dz = 27i ` R(n) + ` Res(f; zp) | . (2.71) 
Tn n=—N poles z, of R 
inside [ y 
On the other hand we can estimate the integral for large enough N as 
follows. First of all because of the condition on R(z), we have that for large 
|z|, 


C 
R <—. 
ROIS To 


Similarly, it can be shown that the function r cot(7z) is bounded along the 
contour, so that |7cot(mz)| < K for some K independent of N. 


© Indeed, notice that 


inz —inz 1 ew 2inz 
ote cos(7z) _ e77 te _ 1+ 


sin(7z) eiTZ — e-inz 1 — e7 2inz 


Therefore along the segment of the contour parametrised by z(t) = (N + 3) + it for 
te [-N — 3,N + 4], we have that 


, izo 
1 i2n((N+35)+it) 
|cot(nz(t))| = HE > 
1 — ei2m((N+5)+it) 
1 —eT2N+1)t 
= 1+et™2N+1)t <l; 
whereas along the segments of the contour parametrised by z(t) = t — i(N + 3) for 


te [-N — N+ 4], we have that 


EA —in(2N+1)(t—i) 
ortaz ia = 


1 — e-im(QNF1)(t—-4) 


1 + e2tte—7(2N+1) 


1 — e2tte—7(2N+1) 


1+ e7 T(2N+1) 
> {2 e r(@NH) 
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where we have used the triangle inequalities on the numerator and (2.36) on the 


denominator. But 
1 + e-7(2N+41) 


1— e-™QN+41) 


is maximised for N = 0, whence it is bounded. 


Since the length of the contour Iy is given by 4(2N + 1), equation (2.28) 
gives the following estimate for the integral 


f i ncot(mz)R(z) dz 


Kc 
< —— 4(2N +1), 
smp” 
which vanishes in the limit N — oo. Therefore, taking the limit N — oo of 
equation (2.71), and using that the left-hand side vanishes, one finds 


` R(n) = — by» Res(f; zk) - 


n=—0o poles z, of R 


More generally, if R(z) does have some poles for integer values of z, then we 
have to take care not to over-count these poles in the sum of the residues. 
We will count them as poles of R(z) and not as poles of 7 cot(7z), and the 
same argument as above yields the general formula: 


oO 


> R(n) =- >. Res(f; zk), for f(z) = mcot(mz) R(z). (2.72) 


n=— 00 poles 
n#žk zk of R 


Let us compute then the sum (2.70). Notice that 


=. 2 1 

Dao 

n=1 n=—oo 
n#0 


The function R(z) = 1/2? has a double pole at z = 0, hence by (2.72) 


“1 ien a 
>, "o lim ce [a cot(mz)] . 
Now, the Laurent expansion of 7 cot(7z) around z = 0 is given by 


1 mêz niz’ 


5 
= ; 2. 
z 3 ip Oe? a) 
whence 
` 1 — i E B 1 
rls es a oe 
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ee 


This sum has an interesting history. Its computation was an open problem in the 18th 
century for quite some time. It was known that the series was convergent (proven in fact 
by one of the Bernoullis) but it was up to Euler to calculate it. His “proof” is elementary 
and quite clever. Start with the Taylor series for the sine function: 


snz=a 1 } ee; 


and treat the expression in parenthesis as an algebraic equation in x?. Its solutions are 


known: nêr? for n = 1,2,3,.... Suppose we could factorise the expression in parenthesis: 
x2 Pe: 2 
1 1 
1 (27)? (37)? 
ee ee eee + O(a*) . 


m2 (27)? ` (37)? 


Therefore, comparing the coefficient of x°, we see that 


1 1 1 1 S il 
31 n2 (27)? (87)? PR (nT)? ’ 


n=1 


which upon multiplication by 7? yields the sum. 


Similarly, we can compute the sum 


oo 
1 
sa 

n=1 m 


= —} Res( f; 0) , 


where f(z) = m cot(mz)/z*, whose Laurent series about z = 0 is can be read 
off from (2.73) above: 


whence 


1 T? a4 
— — O 
2 3273 45z FOE), 
Zal 1 
“4a 
mn 90 


Infinite alternating sums 


The techniques above can be extended to the computation of infinite alter- 
nating sums of the form 


oO 


5 (-1)"R(n), 


n=— CoO 


where R(z) = P(z)/Q(z) is a rational function with deg Q — deg P > 2. Now 
what is needed is a function G(z) which has a simple pole at z = n, for n 


181 


an integer, and whose residue there is (—1)". We claim that this function is 
m csc(7z). Indeed, the Laurent expansion about z = 0 is given by 
1 mz tee 


T cesc(tz) = : | F + 360 + O(2°) ; (2.74) 


whence its residue at 0 is 1. Because of the periodicity csc(a(z + 2k)) = 
esc(7z + 2kr) = esc(mz) for any integer k, this is also the residue about 
every even integer. Now from the periodicity csc(a(z + 1)) = esc(m7z + 7) = 
—csc(mz), we notice that the residue at every odd integer is —1. Therefore 
we conclude that for G(z) = mesc(mz), Res(G;n) = (—1)”. 

The trigonometric identity 


(cse(1z))? = 1+ (cot(mz))? , 


implies that csc(7z) is also bounded along the contour Iy, with a bound 
which is independent of N just like for cot(7z). Just as was done above for 
the cotangent function, we can now prove that the integral of the function 
f(z) = mese(rz)R(z) along Ty vanishes in the limit N — 0. This proof is 
virtually identical to the one given above. Therefore we can conclude that 


co 


` (—1)"R(n) = — ` Res(f; zk), for f(z) = r cse(rz) R(z). 


n=—0o poles 
NAZ zk of R 


(2.75) 
As an example, let us compute the alternating sums 


S, = D = 


For the first sum we have that 


n nN 


2 {=1 
and =) ( Z . 
n=1 


n 


Sy = —5 Res( f; 0) , 


where f(z) = m cse(mz)/z?, whose Laurent expansion about z = 0 can be 


read off from (2.74): 


1 a? Trfz 4 
Fe) = + z+ eq +O)» 


whence the residue is 77/6 and the sum 


S Sa, 
12 
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For the second sum we also have that 
S2 T -4 Res( f; 0) ’ 
where the function f(z) = m csc(mz)/z* has now a Laurent series 


1o m & i 
Mosa a N 


whence the residue is 774/360 and the sum 


Sums involving binomial coefficients 


There are other types of sums which can also be performed or at least es- 
timated using residue techniques, particularly sums whose coefficients are 
related to the binomial coefficients, as in )7*°, (7) R(n). By definition, the 
binomial coefficient (7) is the coefficient of z* in the binomial expansion of 


(1 + z)”. In other words, using the residue theorem, 


n 1 1 n 
= — $ a dz , 
k 2ri Jp ght 
where I is any positively oriented loop surrounding the origin. 
Suppose that we wish to compute the sum 


= (2n\ 1 
S= —. 
DA 
n=0 
We can substitute the integral representation for the binomial coefficient, 


oÁ (L+2)" | ate ae 
f=) lead, ag d| $= ae 


Now provided that we choose I inside the domain of convergence of the series 
ee mar then we would obtain that by uniform convergence, the integral 
of the sum is the sum of the termwise integrals. Being a geometric series, its 


convergence is uniform in the region 


(lee)? 
5z 
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so choose the contour I inside this region. For definiteness we can choose 
the unit circle, since on the unit circle: 


4 
<=. 
75 


(1+2) 
Dz 


In this case, we can interchange the order of the summation and the integra- 
tion: 


271 Ee Sea 


1 A (1+2) dz 5 | 1 i 

S 2 Ont Jy 3827-1-2 
Now the integral can be performed using the residue theorem. The integrand 
has simple poles at (3 + v5)/2 of which only the (3 — v5)/2 lies inside the 
contour. Therefore, 


3- v5 


S = 5 Res (z 5 ) where f(z) : 


Co oa er a 


Computing the residue, we find Res((3 — /5)/2) = 1/5, whence S = V5. 
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Chapter 3 


Integral Transforms 


This part of the course introduces two extremely powerful methods to solving 
differential equations: the Fourier and the Laplace transforms. Beside its 
practical use, the Fourier transform is also of fundamental importance in 
quantum mechanics, providing the correspondence between the position and 
momentum representations of the Heisenberg commutation relations. 

An integral transform is useful if it allows one to turn a complicated 
problem into a simpler one. The transforms we will be studying in this part 
of the course are mostly useful to solve differential and, to a lesser extent, 
integral equations. The idea behind a transform is very simple. To be definite 
suppose that we want to solve a differential equation, with unknown function 
f. One first applies the transform to the differential equation to turn it into 
an equation one can solve easily: often an algebraic equation for the transform 
F of f. One then solves this equation for F and finally applies the inverse 
transform to find f. This circle (or square!) of ideas can be represented 
diagrammatically as follows: 


algebraic equation for F — | solution: F 
inverse 
transform transform 
differential equation for f — — — — >| oolution: f 


We would like to follow the dashed line, but this is often very difficult. 
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Therefore we follow the solid line instead: it may seem a longer path, but it 
has the advantage of being straightforward. After all, what is the purpose of 
developing formalism if not to reduce the solution of complicated problems 
to a set of simple rules which even a machine could follow? 

We will start by reviewing Fourier series in the context of one particular 
example: the vibrating string. This will have the added benefit of introduc- 
ing the method of separation of variables in order to solve partial differential 
equations. In the limit as the vibrating string becomes infinitely long, the 
Fourier series naturally gives rise to the Fourier integral transform, which we 
will apply to find steady-state solutions to differential equations. In partic- 
ular we will apply this to the one-dimensional wave equation. In order to 
deal with transient solutions of differential equations, we will introduce the 
Laplace transform. This will then be applied, among other problems, to the 
solution of initial value problems. 


3.1 Fourier series 


In this section we will discuss the Fourier expansion of periodic functions of 
a real variable. As a practical application, we start with the study of the 
vibrating string, where the Fourier series makes a natural appearance. 


3.1.1 The vibrating string 


Consider a string of length L which is clamped at both ends. Let x denote 
the position along the string: such that the two ends of the string are at 
x = 0 and z = L, respectively. The string has tension T and a uniform 
mass density u, and it is allowed to vibrate. If we think of the string as 
being composed of an infinite number of infinitesimal masses, we model the 
vibrations by a function (x,t) which describes the vertical displacement at 
time t of the mass at position x. It can be shown that for small vertical 
displacements, (x,t) obeys the following equation: 

Bvt) PW 

Ox? ot? 

which can be recognised as the one-dimensional wave equation 


o? 1 8 


where c = ,/T/ is the wave velocity. This is a partial differential equation 
which needs for its solution to be supplemented by boundary conditions for 
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x and initial conditions for t. Because the string is clamped at both ends, 
the boundary conditions are 


Y0, t) =Y(L,t)=0, for all t. (3.2) 
As initial conditions we specify that at t = 0, 
ð t 
aa =0 and y(x,0) = f(x), for all z, (3.3) 
t=0 


where f is a continuous function which, for consistency with the boundary 
conditions (8.2), must satisfy f (0) = f(L) = 0. In other words, the string is 
released from rest from an initial shape given by the function f. 


This is not the only type of initial conditions that could be imposed. For example, in the 
case of, say, a piano string, it would be much more sensible to consider an initial condition 
in which the string is horizontal so that w(x,0) = 0, but such that it is given a blow at 

x,t) 


t = 0, which means that Gaco 


could consider mixed initial conditions in which 7(x,0) = f(x) and Beles) |, 4 = g(x). 
These different initial conditions can be analysed in roughly the same way. 


|t=0 = g(x) for some function g. More generally still, we 


We will solve the wave equation by the method of separation of variables. 
This consists of choosing as an Ansatz for w(x, t) the product of two functions, 
one depending only on x and the other only on t: (x,t) = u(x) v(t). We 
do not actually expect the solution to be of this form; but because, as we 
will review below, the equation is linear and one can use the principle of 
superposition to construct the desired solution out of decomposable solutions 
of this type. At any rate, inserting this Ansatz into (8.1), we have 


1 
u" (x) v(t) = zule)" (e) , 
where we are using primes to denote derivatives with respect to the variable 
on which the function depends: u'(x) = du/dx and v'(t) = dv/dt. We now 
divide both sides of the equation by u(x) v(t), and obtain 


u(x) a2 v” (t) 
ula) œ v(t) 


Now comes the reason that this method works, so pay close attention. 
Notice that the right-hand side does not depend on z, and that the left-hand 
side does not depend on t. Since they are equal, both sides have to be equal 
to a constant which, with some foresight, we choose to call —A?, as it will be 
a negative number in the case of interest. The equation therefore breaks up 
into two ordinary differential equations: 


u" (xz) = —r? u(x) and (t) = —A* e v(t) . 
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The boundary conditions say that u(0) = u(L) = 0. 

Let us consider the first equation. It has three types of solutions depend- 
ing on whether A is nonzero real, nonzero imaginary or zero. (Notice that 
—)? has to be real, so that these are the only possibilities.) If A = 0, then 
the solution is u(x) =a+b«a. The boundary condition u(0) = 0 means that 
a = 0, but the boundary condition u(L) = 0 then means that b = 0, whence 
u(x) = 0 for all z. Clearly this is a very uninteresting solution. Let us 
consider À imaginary. Then the solution is now aexp(|A| x) + bexp(—|A| x). 
Again the boundary conditions force a = b = 0. Therefore we are left with 
the possibility of À real. Then the solution is 


u(x) = acos Ax + bsin Az . 


The boundary condition u(0) = 0 forces a = 0. Finally the boundary condi- 
tion u(L) = 0 implies that 


sinàL=0 => à= T for n an integer. 
Actually n = 0 is an uninteresting solution, and because of the fact that the 
sine is an odd function, negative values of n give rise to the same solution 
(up to a sign) as positive values of n. In other words, all nontrivial distinct 
solution are given (up to a constant multiple) by 


Un(z) =sindA,z, with àn = = and where n = 1,2,3,---. (3.4) 


Let us now solve for u(t). Its equation is 
v(t) = X° w(t), 


whence 
u(t) = a cos Act + b sin Act . 


The first of the two initial conditions (8.3) says that v'(0) = 0 whence b = 0. 
Therefore for any positive integer n, the function 
nT 
Pnl £, t) = sin Ape cos Aye , with A, = T? 

satisfies the wave equation (8.1) subject to the boundary conditions (3.2) 
and to the first of the initial conditions (8.3). 

Now notice something important: the wave equation (8.1) is linear; that 
is, if w(x,t) and ¢(z,t) are solutions of the wave equation, so is any linear 
combination a w(x,t) + 3 ¢(ax,t) where a and are constants. 


188 


Clearly then, any linear combination of the w,,(x,t) will also be a so- 
lution. In other words, the most general solution subject to the boundary 
conditions (3.2) and the first of the initial conditions in (8.3) is given by a 


linear combination 


Ut tr= y bn sin Anz cos A,ct . 


n=l 


Of course, this expression is formal as it stands: it is an infinite sum which 
does not necessarily make sense, unless we chose the coefficients {b,,} in such 
a way that the series converges, and that the convergence is such that we can 
differentiate the series termwise at least twice. 

We can now finally impose the second of the initial conditions (8.3): 


P(x,0) = So bnsin Ane = f(x) . (3.5) 


At first sight this seems hopeless: can any function f(x) be represented as 
a series of this form? The Bernoullis, who were the first to get this far, 
thought that this was not the case and that in some sense the solution was 
only valid for special kinds of functions for which such a series expansion is 
possible. It took Euler to realise that, in a certain sense, all functions f(z) 
with f(0) = f(L) = 0, can be expanded in this way. He did this by showing 
how the coefficients {b„} are determined by the function f(z). 

To do so let us argue as follows. Let n and m be positive integers and 
consider the functions u,(x) and um(x) defined in (8.4). These functions 
satisfy the differential equations: 

u! (x) = —A? un (£) and ou! (x)= —A? Um(£). 
Let us multiply the first equation by um(x) and the second equation by u,(x) 
and subtract one from the other to obtain 


Un (®) Um(T) — Un() Um (2) = (Am — An) Un (T) tm (2) - 


We notice that the left-hand side of the equation is a total derivative 


n (T) Um (2) — Un (T) Um (a2) = (U(E) U(E) — Un(2) Uml), 


u 
whence integrating both sides of the equation from x = 0 to x = L, we obtain 
L 


ng) / Ura (a) Un(x) dz = (u,,(2) Um() — Unle) um (2))| = 0, 
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since U,(0) = un(L) = 0 and the same for um. Therefore we see that unless 
A2 = \2,, which is equivalent to n = m (since n, m are positive integers), the 
integral fv Um(X) Un(x) dx vanishes. On the other hand, if m = n, we have 
that 


L L L 
E . ED F ai 1 E 1 2NTE ie L 
/ tig B) dz / (sin =z) & ME 5 COS -7 c=. 


Therefore, in summary, we have the orthogonality property of the functions 


ele): ; o. 
[eona {E> na (3.6) 


0, otherwise. 


Let us now go back to the solution of the remaining initial condition : 
This condition can be rewritten as 


f(x) = X bn Un (2) = So bn sin (3.7) 


Let us multiply both sides by u,,(z) and integrate from x = 0 to x = L: 
L oo L 
1 f(£)Uml(z) d£ = X | ig 2) thy (oe) de, 
0 | 0 


where we have interchanged the order of integration and summation with 
impunity! Using the orthogonality relation (8.6) we see that of all the terms 
in the right-hand side, only the term with n = m contributes to the sum, 


whence 

L L 
f feats) dr = bm F, 
0 2 


or in other words, 
2 L 
bm = Ff #2) um(e) de (3.8) 
L Jo 


a formula due to Euler. Finally, the solution of the wave equation (8.1) with 
boundary conditions (8.2) and initial conditions (8.3) is 


nret 


p(z, t) = das sin — cos — , (3.9) 


1This would have to be justified, but in this part of the course we will be much more 
cavalier about these things. The amount of material that would have to be introduced to 
be able to justify this procedure is too much for a course at this level and of this length. 
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where : 
2 
b=), f(x) sin de 


Inserting this expression into the solution (8.9), we find that 
“f2 fe t 
p(x, t) = 2 ZJ f(y) sin Y ay sin = cos = 
L OO 
2 t 
= f ? ae = sin — cos = | f(y) dy 


m / K(x, yt) f(y) dy , 


where the propagator K(x, y,t) is (formally) defined by 


a nT NTE nret 
K(z,y;t) = 5 T sin = sin z -7 
n=1 


To understand why it is called a propagator, notice that 


yle, t) = f K (cyst) oly) 0) ay, 


so that one can obtain w(z,t) from its value at t = 0 simply by multiplying 
by K(az,y;t) and integrating; hence K(x,y;t) allows us to propagate the 
configuration at t = 0 to any other time t. 

Actually, the attentive reader will have noticed that we never showed that 
the series X7} bnUn(x), with bn given by the Euler formula (3.8) converges 
to f(x). In fact, it is possible to show that it does, but the convergence is not 
necessarily pointwise (and certainly not uniform). We state without proof 
the following result: 


lim 


L 
N-co 0 


2 
(re — S uto) dr =0. (3.10) 
n=1 
In other words, the function 
h(x) = f(x) — ` bn Big (a) 
n=1 
has the property that the integral 


[near =o. 
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This however does not mean that h(x) = 0, but only that it is zero almost 
everywhere. 


© To understand this notice consider the (discontinuous) function 


ha) = 1, forg = 2b; and 
0, otherwise. 


Then it is clear that the improper integral 
L 
f h(a)? dx = lim f +f 
0 r,sNť 0 zo+s 
The same would happen if h(x) were zero but at a finite number of points. 


Of course, if h(x) were continuous and zero almost everywhere, it would 
have to be identically zero. In this case the convergence of the series (3.7) 
would be aa This is the case if f(x) is itself continuous. 

Expanding (8.10), we find that 


[ tefa = n [ f(x) un(x) dx 
+ S bn bm T Un(Z) Um (x) dz =0. 


n,m=1 


Using (8.8) and (8-6) we can simplify this a little 


fy ea a m r 
0 Í i ere " 2 a 
whence a 

yaa? [payee 

n=1 


Since f(x)? is continuous, it r Teni and hence the right-hand side is 
finite, whence the series ye 2 also converges. In particular, it means that 
limneo bn = 0. 


nei Oa 


3.1.2 The Fourier series of a periodic function 


We have seen above that a continuous function f(x) defined on the interval 
[0, L] and vanishing at the boundary, f(0) = f(L) = 0, can be expanded in 
terms of the functions u(x) = sin(nrz/L). In this section we will generalise 
this and consider similar expansions for periodic functions. 
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To be precise let f(x) be a complex-valued function of a real variable 
which is periodic with period L: f(x + L) = f(x) for all x. Periodicity 
means that f(x) is uniquely determined by its behaviour within a period. In 
other words, if we know f in the interval |0, L] then we know f(x) everywhere. 
Said differently, any function defined on (0, L], obeying f(0) = f(Z) can be 
extended to the whole real line as a periodic function. More generally, the 
interval [0, L] can be substituted by any one period [xo, £o + L], for some xo 
with the property that f (xo) = f(zo+ZL). This is not a useless generalisation: 
it will be important when we discuss the case of f(x) being a discontinuous 
function. The strength of the Fourier expansion is that it treats discontinuous 
functions (at least those with a finite number of discontinuities in any one 
period) as easily as it treats continuous functions. The reason is, as we stated 
briefly above, that the convergence of the series is not pointwise but rather 
in the sense (8.10), which simply means that it converges pointwise almost 
everywhere. 

The functions e,(2) = exp(i2anz/L) are periodic with period L, since 
€n(a + L) = en (x) exp(i27n) = en (x). Therefore we could try to expand 


(3.11) 


for some complex coefficients {cn}. This series is known as a trigonometric 
of Fourier series of the periodic function f, and the {c,} are called the 
Fourier coefficients. Under complex conjugation, the exponentials e,,(x) 
satisfy e,,(2)* = e_,(x), and also the following orthogonality property: 


f Em(T)* en (a) dt = | ei2n(n—m)x/L dr = > HN = mM, an 
9 0 


0, otherwise. 


Therefore if we multiply both sides of (8-11) by e,,()* and integrate, we find 
the following formula for the Fourier coefficients: 


e if ete iG de. 


It is important to realise that the exponential functions e,(a) satisfy the 
orthogonality relation for any one period, not necessarily [0, L]: 


L, ifn= d 
1 eEm(£)* ey (a) dz = i ee (3.12) 
period 


0, otherwise; 
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whence the Fourier coefficients can be obtained by integrating over any one 


period: 


(3.13) 


Again we can state without proof that the series converges pointwise 
almost everywhere within any one period, in the sense that 


2 


N 
dim f f(x) - 2G é,(@)| dz =0, 
period = 


whenever the {c,} are given by (8.13). 


€ 


There is one special case where the series converges pointwise and uniformly. Let g(z) be 
a function which is analytic in an open annulus containing the unit circle |z| = 1. We saw 
in Section that such a function is approximated uniformly by a Laurent series of the 
form 


co 
ios Se 
n=—oco 
where the {bn } are given by equations (2.53) and (2.54). Evaluating this on the unit circle 
z =e? we have that 
j oS . 
ge = dS bn, (3.14) 


n=—oo 


and the coefficients {bn} are given by 


1 20 


bn = — 
a 2r Jo 


get je "™ do, (3.15) 


which agrees precisely with the Fourier series of the function glet?) which is periodic with 
period 27. We can rescale this by defining 0 = 27a/L where x is periodic with period L. 
Let f(x) = g(exp(i2rx/L)), which is now periodic with period L. Then the Laurent series 
(8.14) becomes the Fourier series (3.11) where the Laurent coefficients (3.15) are now give 
by the Fourier coefficients (3-13). 


Some examples 


3 
3 


Figure 3.1: Plot of | sin z| for x € [~r, r]. 
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Let us now compute some examples of Fourier series. The first example is 
the function f(x) =|sinz|. A graph of this function shows that it is periodic 
with period 7, as seen in Figure We therefore try an expansion of the 


form 
OO 
|sinz| = ` Cne”, 


n=— CoO 


where the coefficients {cn} are given by 


1 [* 1 ' 
Ca = F | sin z| e" dx = F sinze ?"* dr . 
T Jo 0 


T 
We can expand sin z into exponentials to obtain 
1 T 


Cn = - (e7 = e **) eina dz 
277% Jo 


_ 2 I eine dz -f e t(2n+1)x dx 
277% | Jo 0 


If i | i | 
= — —i(2n—1)m _ 1) — —i(2n+1)r _ 
Ori z =i ) -are D] 


1 1 1 
= — (2i — 
m ES A 
oe 
og An? —-1 0 


Therefore, 


l X 2 1 om 2 %4 1 
| sin g| = ` = a = ga g sent . 
n=1 


n=— CoO 


Notice that this can be used in order to compute infinite sums. Evaluating 
this at z = 0, we have that 


a 
=l Tan 
whereas evaluating this at x = 7/2, we have that 
3 (-1)" _2-n 
mE 4n? — 1 4 o 


Of course, we could have summed these series using the residue theorem, as 
explained in Section [2.4.5] 
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Figure 3.2: Plot of f(x) for x € [—27, 2r]. 


As a second example, let us consider the function f(x) defined in the 
interval |—7, 7] by 


—1— 2x, if-—r =e = 0; and 
f(z) -| (3.16) 


-1+ 2r, fO<ST<IT. 
and extended periodically to the whole real line. A plot of this function for 


x € |—2r, 27] is shown in Figure It is clear from the picture that f(z) 
has periodicity 27, whence we expect a Fourier series of the form 


f(a) — ` Cn eine 


n=— o0 


where the coefficients are given by 


1 f , 
= 5 | fe) edz 
1 0 


2 P d 2 . 
— 1 ma, 1 —inr 
=5 [f ——z)e r+ fC +—z)e dx 


1 E 2 ingx ~ing 
=. (-1+—2) [e +e] dz 
f (nx) dx + 3 [ (nx) d 
= —— n —= n . 
Th cos(nax ) dx Th x cos(nx) dx 


We must distinguish between n = 0 and n 4 0. Performing the elementary 
integrals for both of these cases, we arrive at 


T= —5((-1)"—1] , for n #0, and gi 
" 0 ’ for n= 0. i 
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Therefore we have that 


2 n ing 
ra= E sal- te 

CAAO 
CO 4 P 

= ua [(—1)” — 1] cosng 
n=1 

= ` “an cosnx 
A oad 
X 8 

= 7 (+1? cos(20+1)a. 
t=0 


Figure 3.3: Plot of g(x) for x € [—37, 37]. 


Finally we consider the case of a discontinuous function: 
z 
glz)==, where x € [—7, 7], 
T 


and extended periodically to the whole real line. The function has period 
27, and so we expect a series expansion of the form 


g(x) = X Cp e”? , 
where the Fourier coefficients are given by 


1 T 


eee! -ins dq 
aq [lave de 


Cn 


We must distinguish the cases n = 0 and n Æ 0. In either case we can 
perform the elementary integrals to arrive at 


0, ifn = 0, 
G=*. 
—(—1)", otherwise. 


Therefore, 
g(x) = S n= 1) eA = yo (—1)" sinnz . 
n=— 00 ae n=1 a 
n#0 


Now notice something curious: the function g(x) is discontinuous at x = (20+ 
1)z. Evaluating the series at such values of x we see that because sin n(2¢ + 
1)x = 0, the series sums to zero for these values. In other words, g(x) is 
only equal to the Fourier series at those values x where g(x) is continuous. 
At the values where g(x) is discontinuous, the Fourier series can be shown to 
converge to the mean of the left and right limits of the function: in this case, 
limn z g(x) = —1 and lim, », g(x) = 1, and the average is 0, in agreement 
with what we just saw. 


3.1.3 Some properties of the Fourier series 


In this section we explore some general properties of the Fourier series of a 
complex periodic function f(x) with period L. 

Let us start with the following observation. If f(x) is real, then the 
Fourier coefficients obey c% = C-n. This follows from the following. Taking 
the complex conjugate of the Fourier series for f(x), we have 


f(x)" = (È erento) = >> cenli), 


n=— 00 n=—0o 


where we have used that e,(x)* = e_,(x). Since f(x) is real, f(x) = f(s) 
for all x, whence 


oO 


5 66,0) = >» Č tal) = X CeCe) . 


n=— o0 n=— o0 n=— CoO 


Multiplying both sides of the equation by ež (x), integrating over one period 
and using the orthogonality relation (8.12), we find that Cm = c*,,. 
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Fourier sine and cosine series 


Suppose that f(x) is periodic and also even, so that f(—x) = f(x). Then 
this means that 


f(x) = 3 [f (2) + f(-2)] . 


If we substitute its Fourier series 


fa) = So cenl), 
we see that 
f(z) = DEN. nt) + en(— = Yo en sMs, 


where Àn = 27n/L. Now we use the fact that cos A-ng = cos Ang to rewrite 
the series as 


f(e) = co + X [cn + c-n] cos A, £ = Sao + X an cosA, x , 


n=l n=1 


where an = [cn + c-n]. Using (8.13) we find the following expression for the 
{an}: > 
On =F I cosA,x f(x) dx . 


one 
period 


The above expression for f(x) as a sum of cosines is known as a Fourier 
cosine series and the {a,,} are the Fourier cosine coefficients. 

Similarly, one can consider the Fourier series of an odd periodic function 
f(—x) = — f(x). Now we have that 


f(x) = 5 (f(z) — f(-#)] , 


which, when we substitute its Fourier series, becomes 


f(x) = D 5 cn [en(£) — en(—2)] = iD i Cn SIDAnL . 
Now we use the fact that sin A_,2 = —sinA,x, and that A) = 0, to rewrite 


the series as 


[0.6] 
=5 i [Cn — C-n Coe bn Sin Anz , 
n=1 


n=1 
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where bn = i [Cn — c-n]. Using (8.13) we find the following expression for the 
{bn}: 
bn = T | sin Ànz f (x) dz . 


one 
period 


The above expression for f(x) as a sum of sines is known as a Fourier sine 
series and the {b,,} are the Fourier sine coefficients. 

Any function can be decomposed into the sum of an odd and an even 
function and this is reflected in the fact that the complex exponential e,,(x) 
can be decomposed into a sum of a cosine and a sine: e,(%) = cos Ang + 
i sin Ang. Therefore for f(x) periodic, we have 


co oO 


{a= > 6, 6,(2) = > Cn [cos Apg + i sin Anq] 


n=— 0 n=— 00 


= žao + Y cos Àn £ + Soba sin Anf , 


n=1 n=1 


where the first two terms comprise a Fourier cosine series and the last term 
is a Fourier sine series. 


Parseval’s identity 


Let f(x) be a complex periodic function and let us compute the following 


integral 
1 
P= | OPa, 


one 
period 


using the Fourier series. 


oO 


` Cn €n (T) 


me=4 f 


one 
period 


Expanding the right-hand side and interchanging the order of integration and 
summation, we have 


1 x 
I-i Yo ao f ea enad Y le, 


n,m=— oo óne n=— o0 
period 
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where we have used the orthogonality relation (8.12). In other words, we 
have derived Parseval’s identity: 


do len? = IFIP - (3.18) 


n=— CO 


© © Explain the Fourier series as setting up an isometry between L? and &. 


The Dirac delta “function” 


Let us insert the expression (8.13) for the Fourier coefficients back into the 
Fourier series (3.11) for a periodic function f(x): 


Fla) = D |F | aO 0) dy] enle): 


period 


Interchanging the order of summation and integration, we find 


f(x) = f p> rena eta f(y) dy = I (x — y) f(y) dy , 


period period 


where we have introduced the Dirac delta “function” 


(y)"en(a) = Dp pete T (3.19) 


n=— o0 


Despite its name, the delta function is not a function, even though it is a limit 
of functions. Instead it is a distribution. Distributions are only well-defined 
when integrated against sufficiently well-behaved functions known as test 
functions. The delta function is the distribution defined by the condition: 


one 
period 


In particular, 


one 
period 


hence it depends on the region of integration. This is clear from the above 
expression which has an explicit dependence on the period L. In the following 
section, we will see another delta functions adapted to a different region of 
integration: the whole real line. 


3.1.4 Application: steady-state response 


We now come to one of the main applications of the Fourier series: finding 
steady-state solutions to differential equations. 
Consider a system governed by a differential equation 


delt) a, 2O 
de ` dt 


+ ag lt) =e. 


The function ¢(t) can be understood as the response of the system which 
is being driven by a sinusoidal force e**. After sufficient time has elapsed, 
or assuming that we have been driving the system in this fashion for an 
infinitely long time, say, for all t < 0, a realistic system will be in a so-called 
steady state: in which ¢(t) = A(w)e“’. The reason is that energy dissipates 
in a realistic system due to damping or friction, so that in the absence of the 
driving term, the system will tend to lose all its energy: so that (t) > 0 in 
the limit as t — oo. To find the steady-state response of the above system 
one then substitutes 6(t) = A(w)e“* in the equation and solves for A(w): 


dot) dt) 


e + ay TF Plt) = Alw) (~w? + iaw + ao) e” =e , 


whence 
1 


—w2? +iaiw+ao 


A(w) = 


In practice, one would like however to analyse the steady-state response 
of a system which is being driven not by a simple sinusoidal function but 
by a general periodic function f(t), with period T. This suggests that we 
expand the driving force in terms of a Fourier series: 


f(t) = y Cn eiznnt/T 
where the coefficients are given by 


1 
Cn = I FE eT dt. 


one 
period 
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Above we found the steady-state response of the system for the sinusoidal 
forces exp(i27nt/T), namely 


1 
alt) = i2rnt/T i 
Palt) Ann? T? + innn T ao 


Because the equation is linear, we see that the response to a force which is 
a linear combination of simple sinusoids will be the same linear combination 
of the responses to the simple sinusoidal forces. Assuming that this can be 
extended to infinite linear combination! we see that since ¢,(t) solves the 
differential equation for the driving force exp(i27nt/T), then the series 


o(t) = ` Cn On{t) 


n=— oo 


solves the differential equation for the driving force 


f(t) = >, Cpe TT | 


As an example, let us consider the differential equation 
Pott), dott) 
dt2 = dt 


where f(t) is the periodic function defined in (8.16). This function has period 
T = 27 and according to what was said above above, the solution of this 
equation is 


+ 20(t) = f(t) , 


co 


C : 
t = n int 
old) Dp ae a ) 


n=— o0 


where the coefficients c, are given in (8.17). Explicitly, we have 


< 2((—1)}—1 am 
y (=9”-1) 


t) = ; 
o(t) a. T? n? —n? +2in +2 
n#0 


Fourier series, since they contain an infinite number of terms, are limiting cases of 
linear combinations and strictly speaking we would have to justify that, for example, the 
derivative of the series is the series of termwise derivatives. This would follow if the series 
were uniformly convergent, for example. In the absence of general theorems, which will 
be the case in this course, one has to justify this a posteriori. 
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We would now have to check that (t) is twice differentiable. It could not 
be differentiable three times because, by the defining equation, the second 
derivative is given by 


and f(t) is not differentiable. The twice-differentiability of @(t) follows from 
the uniform convergence of the above series for ¢(t). To see this we apply 
the Weierstrass M-test: 


2 ((-1)" — 1) et P 8 
T? 2 n2? + 2in + 2| T rnt’ 
and the series z 
8 
D 
re 


is absolutely convergent. Every time we take a derivative with respect to t, 
we bring down a factor of in, hence we see that the series for ¢(t) can be 
legitimately differentiated termwise only twice, since the series 


8 
Lae 2 oe 
n=—co n=— 0O 
n#0 n#0 


(oe) 


is not. 


Green’s functions 


Let us return for a moment to the general second order differential equation 


treated above: Polt) n 
dt2 Fa dt + ao Q(t) = f(t) , (3.20) 


where f(t) is periodic with period T and can be expanded in a Fourier series 


f(t) n ` Cn el2nnt/T , 


n=— o0 
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Then as we have just seen, the solution is given by 
ee i2nnt/T 


Cne 
t) = , 
o(t) > —4r?n?/T? + ia 20 n/T + ag 


n=— o0 


where the coefficients c, are given by 


1 i 
a= J f(t) eT dt, 


period 


Inserting this back into the solution, we find 


oo 1 ane ei2nnt/T 
t z= — —UsTNT d 
of) Ds, T / re T —47?n?/T? + ia 2r n/T + ao 
period 


Interchanging the order of summation and integration, 


a= f 


one 
period 


1 ES i2rn(t—r)/T 


e 
— d 
T È —4r?n? /T? + iai 2r n/T + ao MOLY 


n=— CoO 


which we can write as 


g(t) = l G(t—7) f(r) dr , (3.21) 


where 
Se T etannt /T 


G(t) = 
(f) 2 —4r?n? + ia 2r nT + ag T? 


n=— CoO 


is the Green’s function for the above equation. It is defined (formally) as 
the solution of the differential equation 


PG(t) dG(t) 
H } = ; 22 
aa tu t a9 Gt) = Sl) (3.22) 
where 
— 1 3 i2rnt/T 
d(t) = T L“ 
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is the Dirac delta “function.” In other words, the Green’s function is the 
response of the system to a delta function. It should be clear that if G(t) 
satisfies (8.22) then ¢(t) given by (8.21) satisfies the original equation (3.20): 


d'ot) de(t) | 


df2 + ay dt + ao (t) 
2 = = 
= f € we a) + ay oe 7) tac- 7)) f(r) dr 
period 
= J at= f(r)dr = ft) 
period 


3.2 The Fourier transform 


In the previous section we have seen how to expand a periodic function as a 
trigonometric series. This can be thought of as a decomposition of a periodic 
function in terms of elementary modes, each of which has a definite frequency 
allowed by the periodicity. If the function has period L, then the frequencies 
must be integer multiples of the fundamental frequency k = 27/L. In this 
section we would like to establish a similar decomposition for functions which 
are not periodic. A non-periodic function can be thought of as a periodic 
function in the limit L — oo. Clearly, the larger L is, the less frequently the 
function repeats, until in the limit L — oo the function does not repeat at 
all. In the limit L — oo the allowed frequencies become a continuum and the 
Fourier sum goes over to a Fourier integral. In this section we will discuss 
this integral as well as some of its basic properties, and apply it to a variety 
of situations: solution of the wave equation and steady-state solutions to 
differential equations. As in the previous section we will omit most of the 
analytic details which are necessary to justify the cavalier operations we will 
be performing. 


3.2.1 The Fourier integral 


Consider a function f(x) defined on the real line. If f(x) were periodic with 
period L, say, we could try to expand f(z) in a Fourier series converging to 
it almost everywhere within each period 


f(a) — ` Cn ci2nna/L 


n=— o0 
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where the coefficients {cn} are given by 


1 me —i2nna/L 
Cn = = (x)e dz , (3.23) 
L J-i 


where we have chosen the period to be [—L/2, L/2] for convenience in what 
follows. Even if f(x) is not periodic, we can still define a function 


filz) = bD eer. (3.24) 


n=—Cco 


with the same {c,} as above. By construction, this function f(x) is periodic 
with period L and moreover agrees with f(x) for almost all x € [—L/2, L/2]. 
Then it is clear that as we make L larger and larger, then f,(x) and f(x) 
agree (almost everywhere) on a larger and larger subset of the real line. One 
should expect that in the limit L — oo, f(x) should converge to f(x) in 
some sense. The task ahead is to find reasonable expressions for the limit 
L — œ of the expression of f(x) and of the coefficients (8.23). 


© The continuum limit in detail. 


This prompts us to define the Fourier (integral) transform of the func- 
tion f(a) as 


(3.25) 


provided that the integral exists. Not every function f(x) has a Fourier 
transform. A sufficient condition is that it be square-integrable; that is, so 
that the following integral converges: 


ne I FE) de. 


OO 


If in addition of being square-integrable, the function is continuous, then one 
also has the inversion formula 


(3.26) 
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More generally, one has the Fourier inversion theorem, which states that if 
f(x) is square-integrable, then the Fourier transform f(k) exists and more- 
over 


a TET F(x), | if f is conunnous at x, and 
= 5 limy zz + lim Nae] f(y), otherwise. 


In other words, at a point of discontinuity, the inverse transform produces 
the average of the left and right limiting values of the function f. This was 
also the case with the Fourier series. In any case, assuming that the function 
f(x) is such that its points of discontinuity are isolated, then the inverse 
transform will agree with f(x) everywhere but at the discontinuities. 


Some examples 


Before discussing any general properties of the Fourier transform, let us com- 
pute some examples. 
Let f(x) = 1/(4+ x°). This function is clearly square-integrable. Indeed, 


the integral 
= 1 
2 
= a d 
=| gmt 


can be computed using the residue theorem as we did in Section We 
will not do the calculation in detail, but simply remark that || f|/? = 7/16. 
Therefore its Fourier transform exists: 


T 1 oo e tke 
F(kj= wf ise: 


We can compute this integral using the residue theorem. According to equa- 
tion (2.64), we have that for k < 0, we pick up the residues of the poles in 
the upper half-plane, whereas for k > 0 we pick up the poles in the lower 
half-plane. The function exp(ikz)/(4 + 2*) has simple poles at z = +2i%. 
Therefore we have 


i(k) = + 2ri Res(2i) , if k < 0, and 
| Æ (~2ri) Res(—2i), if k > 0; 


per if k < 0, and 
1e”; MBO 
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We can also verify the inversion formula. Indeed, 
jot [hemes 

1 ; del , 

ets dk ji l T ett dk 


Paaki [~ ik 
= | -e%**** dk —e tik dk 
[ x T Te 

1 

4 


ie at sl 
Q+in 2—izr 


0, otherwise. 


1, for |z| < 7, and 
se)= | i 


It is clearly square-integrable, with || f|/? = 27. Its Fourier transform is given 


T eae ein 2 oe = 
fay=s- j see i=j e dx = 


We will not verify the inversion formula in this example. If we were to do 
this we would be able to evaluate the integral in the inversion formula for 
x # +r and we would obtain f(x) for those values. The residue methods 
fail at the discontinuities x = +7, and one has to appeal to more advanced 


methods we will not discuss in this course. 
Finally consider the Fourier transform of a finite wave train: 


sing, for |x| < 67; and 
f(x) = 
0, otherwise. 


This function is clearly square-integrable, since 


67 
IfI? = | T ER 


6r 
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sin 7k 


Its Fourier transform is given by 


e tke 
re rl f(x dx 


sin z e~** dr 
~~ On 


1 
7 Ani — 6r 

1 SF aah (1+) 
z i(1—k)x —i(l+k)x 
= zl (e = E ) dx 

—67 

1 i —ik6r ik6r i —ikőôr ikr 
aa A a 
_ isinôrk 1 n 1 
g 2T 1-k 1+kk 

isin 6rk 


m(1 — k?) 


We will not verify the inversion formula for this transform; although in this 
case the formula holds for all x since the original function is continuous. 


3.2.2 Some properties of the Fourier transform 


In this section we will discuss some basic properties of the Fourier transform. 
All the basic properties of Fourier series extend in some way to the Fourier 
integral. Although we will not discuss all of them, it would be an instructive 
exercise nevertheless to try and guess and prove the extensions by yourself. 

The first basic property is that if f(k) is the Fourier transform of f(z), 
then f (—k)* is the Fourier transform of f(a)*. This follows simply by taking 
the complex conjugate of the Fourier integral (8.25): 


fay =f flay de =F); 


whence f(—k)* = ¥{f(x)*}(k). Therefore we conclude that if f(x) is real, 
then f(k)* = f(—K). 

Suppose that f'(x) = ve 
is given by 


F {f(a ->f f'(x) e™ de . 
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Let us integrate by parts: 


#{f'(2)} (k) = f ” (ik) fæ) de 


~ On LoS 


where we have dropped the boundary terms since f(x) is square-integrable 
and hence vanishes in the limit |x| — oo. In other words, 


F{f'(x)} (hk) = ik F {f(x} (k) . (3.28) 


More generally, if the n-th derivative f(x) is square-integrable, then 


F {fF (a)} (k) = (ik) F {f (2)} (k) - (3.29) 


This is one of the most useful properties of the Fourier transform, since it 
will allow us to turn differential equations into algebraic equations. 


Another version of the Dirac delta function 


Let f(x) be a continuous square-integrable function. In this case, the Fourier 
inversion theorem says that the inversion formula is valid, so that 


i@= f O F) e dk. 


If we insert the definition of the Fourier transform f (k) in this equation, we 


obtain STi N 
f(a) = f | [io iy oi ak 


If f is in addition sufficiently well-behaved?! we can exchange the order of 
integrations to obtain 


ree) = fe [et at] sana = foe say 


oO —oo —co 


where we have introduced the Dirac delta function 


3Technically, it is enough that f belong to the Schwarz class, consisting of those 
infinitely differentiable functions which decay, together with all its derivatives, sufficiently 
fast at infinity. 
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Notice that we can also write this as 


(3.30) 


which makes it clear that it is the Fourier transform of the constant function 
f(x) = 1. Of course, this function is not square-integrable, so this statement 
is purely formal. We should not expect anything better because the Dirac 
delta function is not a function. This version of the Dirac delta function is 
adapted to the integral over the whole real line, as opposed to the one defined 
by equation (3.19), which is adapted to a finite interval. 


Parseval’s identity revisited 


Another result from Fourier series which extends in some fashion to the 
Fourier integral transform is the one in equation (8.18). We will first attempt 
to show that the Fourier transform of a square-integrable function is itself 
square-integrable. Let us compute 


lf? = I fk) dk 


n 1 ü —ikx 
L F flaje dx 
1 f / calle) fly)* e** e da dy dk . 


Being somewhat cavalier, let us interchange the order of integration so that 
we do the k-integral first. Recognising the result as 27d(x — y), with d(x — y) 
the delta function of (3.30), we can simplify this to 


2 
dk 


1 [0.6] 


= oz Ta 


‘ e f 1 
I= f OFE de f@)P de = ZIF 


Therefore since || f ||? is finite, so is || f||?, and moreover their norms are related 


by Parseval’s identity: 
z 1 
2 2 
== 3.31 
If? = IFIP , (3.31) 


which is the integral version of equation (8.18). 


For many applications this factor of 1/27 is a nuisance and one redefines the Fourier 
transform so that 


5 (F} (k) = = J T Hae de, 
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and the inversion formula is more symmetrical 
1 OO ox ik 
z) = —— F k) e”? dk . 
fle) = J FU 


In this case, Parseval’s identity becomes simply 


IF{F} 11? = IFIP - 


© © One should mention that the Fourier transform is an isometry from L? to L?. 


3.2.3 Application: one-dimensional wave equation 


Let us now illustrate the use of the Fourier transform to solve partial dif- 
ferential equations by considering the one-dimensional wave equation (8.1) 
again. This time, however, we are not imposing the boundary conditions 
for x. Instead we may impose that at each moment in time t, w(z,t) 
is square-integrable, which is roughly equivalent to saying that the wave has 
a finite amount of energy. As initial conditions we will again impose (8.3), 
where f(z) is a square-integrable function. 

We will analyse this problem by taking the Fourier transform of the wave 
equation. From equation with n = 2 we have that 


sS ute) } = kt) 


where i: 
dese | wane ae, 

is the Fourier transform of y(x, t). Similarly, taking the derivative inside the 
integral, 

o? O 

— = —-W(k,t) . 

sE wan) = 0t 

Therefore the wave equation becomes 


1 & 2 r 
2 av (kt) E =k’ (k, t) : 


The most general solution is given by a linear combination of two sinusoids: 


wk, t) = â(k) cos ket + b(k) sin kct , 


A 


where the “constants” @ and 6 can still depend on k. The first of the initial 
conditions (8.3) implies that 


dÅ (k, t) 


ese =0, 


Ot 


whence we have that 6(k) = 0. Using the inversion formula , we can 
write 


p(x, t) = | a(k) cos ket e™” dk . 


oO 


Evaluating at t = 0, we have that 


v(e,) =F) = f ~ alk) ef dk , 


(oe) 


whence comparing with the inversion formula (8.26), we see that a(k) = f(k), 
so that 


Vet) = a f(k) cos ket e** dk , (3.32) 


k) = zf fe) ede: 


Inserting back this expression into the solution (8.32) and interchanging the 
order of integration, we have 


wat) = f = [ flyye iy cos ket e” dk 


J ied cos ket et) a| f(y) dy 


where 


where we have introduced the propagator K (x,t) defined by 


1 f” , 
K(zx,t) = =J cos ket e™” dk . 


oO 


Notice that K (x,t) clearly satisfies the wave equation (3.1): 


0? 1 8 
Rak (2, t) = pei (2 t) 
with initial conditions 
OK (a, t) 
Ot = 
and 
K(x,0) = 6(2) 


according to (8.30). 
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3.2.4 Application: steady-state response 


Another useful application of the Fourier transform is to solve for the steady- 
state solutions of linear ordinary differential equations. Suppose that we have 
a system governed by a differential equation 


Polt) P do(t) 
d dt 


where f(t) is some driving term. We saw that when f(t) is periodic we 
can use the method of Fourier series in order to solve for ¢(t). If f(t) is 
not periodic, then it makes sense that we try and use the Fourier integral 
transform. Let us define the Fourier transform $(w) of ¢(t) by 


F ao P(t) = F(t) , 


FOO = dw) = =f ea. 


Similarly, let f (w) denote the Fourier transform of f(t). Then we can take 
the Fourier transform of the differential equation and we obtain an algebraic 
equation for ¢(w): 
-w b(w) + iaw bw) + ao dw) = fw) , 

which can be readily solved to yield 

1 A 

- W 
—w? + iaw + ag 


w) = 
Now we can transform back via the inversion formula 


s= f daws f fu) de. 


E 
oo Wirta + Ao 


Using the definition of f(w), we have 


oo eit 1 oo : 
= “0 dr| dw. 
plt) L —w? + iaw + ao È L Fre r| "i 


If, as we have been doing without justification in this part of the course, we 
interchange the order of integration, we obtain 


a= f “Ghai 


(oe) 


where we have introduced the Green’s function G(t), defined by 


eee 1 ee 


2T J œ W? + iaw + ao 
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Notice that as in the case of the Fourier series, G(t) satisfies the equation 


d’?G(t) 
dt? 


dG(t) 7 
= + a G(t) = H(t) , 


F Ay 


so that it is the response of the system to a delta function input. 
As aconcrete illustration of the method, let us find a steady-state solution 
to the following differential equation: 


do(t) | > dole) 
d? ° dt 


where f(t) is the pulse defined in (8.27). Let us first compute the Green’s 
function for this system: 


Git) 1 T et d 
= W. 
2T J œ —W?2 + 2iw +2 


We can compute this using the residue theorem and, in particular, equation 
(2.64). The integrand has simple poles at i + 1, which lie in the upper half- 
plane. Therefore it follows immediately from equation (2.64), that G(t) = 0 
for t < 0. For t > 0, we have that 


+ 20(t) = f(t) , 


G(t) = > miReti Hinet- ij 


We compute the residues to be 
—t+it —t—it 


2 


Res(i + 1) = = 


whence for t > 0, we have 
G(t)=—e™ sint. 
In summary, the Green’s function for this system is 


0, for t < 0, and 
—e™ sint, fort>0. 


Notice that although it is continuous at t = 0, its first derivative is not 
continuous there, and hence the second derivative does not exist at t = 0. 
This is to be expected, since the second derivative of G(t) at t = 0 is related 
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to the delta function, which is not a function. In any case, we can now 
integrate this against the pulse f(t) to find the solution: 


a) =f Gt- Far 
=f Gt-nar. 


Taking into account that G(t) = 0 for t < 0, we are forced to distinguish 
between three epochs: t < =r, —r < t < m, and t > 7, corresponding to the 
time before the pulse, during the pulse and after the pulse. We can perform 
the integral in each of these three epochs with the following results: 


0, for t < —7, 
ot) = —$ — sorte (cost+sint) , for t € [—7,7], and 
e ‘sinh z(cost + sint), for t > 7. 


Notice that before the pulse the system is at rest, and that after the pulse 
the response dies off exponentially. This is as we expect for a steady-state 
response to an input of finite duration. 


3.3 The Laplace transform 


In the previous section we introduced the Fourier transform as a tool to 
find steady-state solutions to differential equations. These solutions can be 
interpreted as the response of a system which has been driven for such a 
long time that any transient solutions have died out. In many systems, 
however, one is also interested in the transient solutions, and in any case, 
mathematically one usually finds the most general solution of the differential 
equation. The Laplace transform will allow us to do this. In many ways the 
Laplace transform is reminiscent of the Fourier transform, with the important 
difference that it incorporates in a natural way the initial conditions. 


3.3.1 The Heaviside D-calculus 


Let us start by presenting the D-calculus introduced by Heaviside. The 
justification for this method is the Laplace transform. An example should 
suffice to illustrate the method, but first we need to introduce a little bit of 
notation. 
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Differential operators 


The result of taking the derivative of a function is another function: for 
example, d/dt(t") = nt”! or d/dt sint = cost. Therefore we can think 
of the derivative as some sort of machine to which one feeds a function as 
input and gets another function in return. Such machines are generally called 
operators. It is convenient to introduce symbols for operators and, in the 
case of the derivative operator, it is customary to call it D. Therefore, if f is 
a function, Df is the function one obtains by having D act on f. A function 
is defined by specifying its values at every point t. In the case of Df we have 


pi) = 29. 


Operators can be composed. For example we can consider D? to be the 
operator which acting on a function f gives D? f = D(Df), or 


_ adi) _ Pf) 


DfA = D(Df)(t) = SS = 


Therefore D? is the second derivative. Operators can be multiplied by func- 
tions, and in particular, by constants. If a is a constant, the operator aD is 
defined by 

df (t) 


t 
Similarly, if g(t) is a function, then the operator gD is defined by 


df(t) 


(gD) f(t) = 9(t) DEO = a(t) E 


Operators can also be added: if g and h are functions, then the expression 
gD? + hD is an operator, defined by 


FUE df (t) 
dt? EAM!) dt ` 


(gD? + hD) f(t) = g(t) 


In other words, linear combinations of operators are again operators. Oper- 
ators which are formed by linear combinations with function coefficients of 
D and its powers are known as differential operators. A very important 
property shared by all differential operators is that they are linear. Let us 
consider the derivative operator D, and let f(t) and g(t) be functions. Then, 


D(f + 9)(t) = aL +90) E ae | dott 


= Df(t) + Dg(t) . 
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In other words, D(f +g) = Df + Dg. Similarly, it is easy to see that this is 
still true for any power of D and for any linear combination of powers of D. 
In summary, differential operators are linear. 

The highest power of D which occurs in a differential operator is called 
the order of the differential operator. This agrees with the nomenclature 
used for differential equations. In fact, a second order ordinary differential 
equation, like this one 


PFO 


a9 £10 df(t) 


a + elt) FO = h(t), 


can be rewritten as an operator equation AK f(t) = h(t), where we have 
introduced the second order differential operator K = a D? +b D +c. 


b(t) 


An example 


Suppose we want to solve the following differential equation 
PIO HO 
d? '` dt 
We first write it down as an operator equation: 
(D? +3D +2) ft) =e. 


Next we will manipulate the operator formally as if D were a variable and 
not an operator: 


Eo aye", (3.33) 


D? +3D+2=(D+2)(D+1); 
whence formally 
1 s 1 1 z 
= ? = at 3.34 
O= pes zal l (3:34) 


where we have used a partial fraction expansion: remember we are treating 
D as if it were a variable z, say. Now we do something even more suspect 
and expand each of the simple fractions using a geometric series: 


1 = gah 1 2 a ee 
pte ee _1)i ps etre qe pi 
ooi Os 1))D and D5 Del Dra ae 

j=0 j=0 
Now notice that D e* = ie"; hence 
1 t = inj it < t 1 t 
e = X _(-1} D’ e SS Le e” = - e”, 

D+1 = ree 1+1 

1 W = j 1 j pit — = j a it 1 it 
D+2° = 2 L a =D Qit1~  i+2 
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Therefore into equation (3.34), we obtain 
1 1 . 1-3 , 
f(t) = | | et Z 1 et 


i+1 i+2 10 
which can be checked to obey equation (3.33) by direct substitution. 

Of course this is only a particular solution to the differential equation 
(8:33). In order to obtain the most general solution we have to add to 
it the complementary solution, which is the most general solution of the 
associated homogeneous equation: 

d? f(t df (t 
Kf(t)=(D+1)(D +2) f(t) = et ) +309 +25 =0. 
The reason for this is that if g(t) solves the equation Kg(t) = 0, and K f(t) = 
e”, then, by linearity, K(f + 9)(t) = K f(t) + Kg(t) = e* +0 = e". To find 
the complementary solution, notice that 


(D +1)(D + 2)f(t) =0 


has two kinds of solutions: 
(D + 1)fi(t) =0 and (D + 2) fo(t) =0. 
These first order equations can be read off immediately: 
filt) =ae* and fht) = be”, 


where the constants a and b are to be determined from the initial conditions: 
f(0) and f’(0), say. In summary, we have the following general solution to 
the differential equation (8.33): 


1-3: , 
t _ at 
which can be checked explicitly to solve the differential equation (8.33). No- 


tice that the first term corresponds to the steady-state response and the last 
two terms are transient. 


tae '+be™, (3.35) 


3.3.2 The Laplace transform 


The D-calculus might seem a little suspect, but it can be justified by the use 
of the Laplace transform, which we define as follows 


(3.36) 


provided that the integral exists. This might restrict the values of s for which 
the transform exists. 

A function f(t) is said to be of exponential order if there exist real 
constants M and a for which 


FO < Me". 


It is not hard to see that if f(t) is of exponential order, then the Laplace 
transform F(s) of f(t) exists provided that Re(s) > a. 


© To see this let us estimate the integral 
co oo S 
F(s)| = the st dt < t)| let] dt < M e% e7 Re(s)t dt. 
| =< — 
0 o 0 


Provided that Re(s) > a, this integral exists and 


Notice that in particular, in the limit Re(s) — oo, F(s) — 0. This can be proven in more 
generality: so that if a function F(s) does not approach 0 in the limit Re(s) — oo, it 
cannot be the Laplace transform of any function f(t). 


We postpone a more complete discussion of the properties of the Laplace 
transform until later, but for now let us note the few properties we will need 
to justify the D-calculus solution of the differential equation (3.33) above. 

The first important property is that the Laplace transform is linear. 
Clearly, if f(t) and g(t) are functions whose Laplace transforms F(s) and 
G(s) exist, then for those values of s for which both F(s) and G(s) exist, we 
have that 


Lif +g} (s) =L tft (s) +£ {g} (s) = F(s) + Gls) . 


Next let us consider the function f(t) = exp(at) where a is some complex 
number. This function is of exponential order, so that its Laplace transform 
exists provided that Re(s) > Re(a). This being the case, we have that 


1 


s—a ` 


£ fe} (s) = f e% e= dt = 


Suppose now that f(t) is a differentiable function. Let us try to compute 
the Laplace transform of its derivative f’(t). By definition, 


(3.37) 


co 
0 


£{f"} (9) = f Pdt, 
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which can be integrated by parts to obtain 


COS f sroto 
= sL {f} (s) — f(O) + lim fe. 


Provided the last term is zero, which might imply conditions on f and/or s, 
we have that 


L {F} (s) =s & {f} (s) — £(0) . (3.38) 


We can iterate this expression in order to find the Laplace transform of 
higher derivatives of f(t). For example, the Laplace transform of the second 
derivative is easy to find by understanding f”(t) as the first derivative of 
f'(t) and iterating the above formula: 


LLP} (8) = £ {F} (s) 
=s& {f} (s) — f’ (0) 


provided that f(t) exp(—st) and f’(t) exp(—st) both go to zero in the limit 
to. 


The D-calculus justified 


We are now ready to justify the D-calculus solution of the previous section. 
This serves also to illustrate how to solve initial value problems using the 
Laplace transform. 

Consider again the differential equation (3.33): 


d’ f(t) 3 H(t) 
dt2 `` dt 


and let us take the Laplace transform of both sides of the equation. Since 
the Laplace transform is linear, we can write this as 


L{f"} (8) +34 {F} (s) +28 {f} (s) = £ fe"} (s) . 


Letting F(s) denote the Laplace transform of f, we can use equations (8.37) 
and (8.38) to rewrite this as 


+2fH =e, 


s*F(s) — sf(0) — f'(0) +3(sF(s) — f(0)) + 2F(s) = = 


S—? 
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which can be solved for F(s): 


1 1 


F = 
(s) s2 +3s+2|s—i 


+ (s+3)f(0) + fO). 


Expanding this out, and factorising s? + 3s +2 = (s + 1)(s + 2), we have 


1 _ (8 +3) f(0) + f'(0) 
(s—i)(s+1)(s+2) ` (st+t1)(s+2) ` 


F(s) = 


We now decompose this into partial fractions: 


F(s) = plas), 2+ 70) =—70=2) , -2=)0=FO=-70) 
a a P=) | PE l 


Using linearity again and (3.37) we can recognise this as the Laplace trans- 
form of the function 


fe) = Tet + (0r E) e 


10 
+ (FF) - 10) e, 


which agrees with (8.35) and moreover displays manifestly the dependence of 
the coefficients a and b in that expression in terms of the initial conditions. 


The inverse Laplace transform 


The Laplace transform is applicable to a wide range of initial value problems. 
The main difficulty stems from inverting the transform, which might be dif- 
ficult. In practice one resorts to tables of Laplace transforms, like Table 
below; but if this does not work, there is an inversion formula, as for the 
Fourier transform, which we will state without proof. It says that if F(s) is 
the Laplace transform of a function f(t), then one can recover the function 
(except maybe at points of discontinuity) by 


= f Feas, 


OT) tee 


where the integral is meant to be a contour integral along the imaginary axis. 
In other words, parametrising s = iy, we have 


m= n F (iy) e®™ dy . 


= yz T 
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It may happen, however, that Laplace transform F(s) does not make sense 
for Re(s) = 0, because the integral (8.36) does not converge. Suppose instead 
that there is some positive real number a such that the Laplace transform 
of f(t)e~™ does exist for Re(s) = 0. In this case, we can use the inversion 
formula to obtain 
1 100 
t -at È t —at st ds. 
fet = sf EO) eds 

Using the shift formula (8.43), £ {f(t) e~™}(s) = F(s + a), whence, multi- 
plying by e% on both sides of the inversion formula: 


f(t) : f > F(s +a) e8t% ds . 


Changing variables of integration to u = s + a, we have 


1 a+ioo 

t) = — F(u) e“ du , 3.39 

O= Pedu (3.39) 

which can now be interpreted as a contour integral along the line u = a. In 

other words, we can for free shift the original contour of integration to the 
right until F(s) makes sense on it. 


3.3.3 Basic properties of the Laplace transform 


We shall now discuss the basic properties of the Laplace transform. We have 
already seen that it is linear and we computed the transform of a simple 
exponential function exp(at) in equation . From this simple result, we 
can compute the Laplace transforms of a few simple functions related to the 
exponential. 

Let w be a real number. From the fact that exp(iwt) = coswt + i sin wt, 
linearity of the Laplace transform implies that 


L {el (s) = £ {coswt} (s) + iL {sin wt} (s) 
1 S 


== i 
s— iw s2 +w? s2 +w?’ 


from where we can read off the Laplace transforms of cos wt and sin wt. Notice 
that these expressions are valid for Re(s) > 0, since this is the condition for 
the existence of the Laplace transform of the exponential. 

Similarly, let @ be a real number and recall the trigonometric identities 
(2.16), from where we can deduce that 


cosh Gt = cos ift and sinh Gt = —isin ift . 
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As a result, we immediately see that the Laplace transforms of the hyperbolic 
functions are given by 
g £ {sinh eR 
L {cosh Bt} (s) = PEEN and {sinh Gt} (s) = 2p’ 
where the condition is now Re(s) > |8]. 
Putting a = 0 in (8.37), we see that the Laplace transform of the constant 
function f(t) = 1, is given by 


L{Ih(s)=—, 


which is valid for Re(s) > 0. 
Suppose that f(t) has Laplace transform F(s). Then by taking derivatives 
with respect to s of the expression (3.36) for F(s), we arrive at 


£ ft" f)} (s) = (-1)"F(s) , (3.40) 


which is valid for those values of s for which the Laplace transform F(s) of 
f(t) exists. In particular, if we take f(t) = 1, we arrive at 


£ {t"} (s) =(-1)" = = (3.41) 


valid for Re(s) > 0. 
How about if n is a negative integer? Let us consider the Laplace trans- 
form of g(t) = f(t)/t, and let us call it G(s). From equation (8.40) for n = 1, 


we have that 
L{f(t)} (s) = £ {tg(t)} (s) = -G"(s) , 


so that G(s) in an antiderivative for —F' (s); that is, 


If we demand that G(s) vanishes in the limit s — oo, then we must choose 
a = oo, and hence 


L{f(t)/t}(s) = ‘i POE (3.42) 


Another important property of the Laplace transform is the shifting for- 
mula: 


L {e™f(t)} (s) = £ {f(t} (s — a) = F(s — a) , (3.43) 
which is evident from the definition (8.36) of the Laplace transform. Related 
to this property is the following. Given a function f(t), let r be a positive 
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real constant, and introduce the notion of the delayed function f,(t), defined 
by 


otherwise. 


f(t) = ta —7T), fort >7, and (3.44) 


In other words, the delayed function is the same as the original function, but 
it has been translated in time by 7, hence the name. The Laplace transform 
of the delayed function is given by 


LL fe} je Pie at 


A ft- r)e™ dt 
ae f(u) en" du 


=e" {f} 


where we have changed the variable of integration from t to u = t — T. In 
other words, 


L {fr} (s) =e F(s). (3.45) 
Although the delta function ô(t—7) is not a function, we can nevertheless 
attempt to compute its Laplace transform: 


e* , ifr > 0, and 


0, otherwise. 


L {8t —7)}(s) = 1 d(t—T)e "dt = 
0 
Introducing the Heaviside step function 6(t), defined as 


1, fort > 0, and 
A(t) = 
0, tor? <0 


we see that 
£L{d(t —7)}(s) =O(r)e* . 

Finally let us consider the Laplace transforms of integrals and deriva- 
tives of functions. In the previous section we derived equation for 
the Laplace transform of the derivative f’(t) of a function f(t). Iterating 
this expression we can find a formula for the Laplace transform of the n-th 
derivative of a function: 


F 
L 


L {Ff (s) = s” F(s) — E 1ER A (0 Js (3.46) 
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where by f we mean the original function f. This formula is valid whenever 
limy oo f™ (t) exp(—st) = 0. How about integration? Consider the function 


ott) =f oar. 


What is its Laplace transform? We know that since g(t) is an antiderivative 
for f, g(t) = f(t) and moreover, from the definition, that g(0) = 0. Therefore 
we can compute the Laplace transform of f(t) = g'(t), in two ways. On the 
one hand it is simply F(s), but using we can write 


L {Ff} (s) = £ {9'} (s) = s£ {g9} (s) — g(0) = s£ {9} (s) , 


eff toaya. 


These properties are summarised in Table 


whence 


3.3.4 Application: stability and the damped oscillator 


In this section we will use the Laplace transform to characterise the notion 
of stability of a dynamical system which is governed by a linear ordinary 
differential equation. 

Many systems are governed by differential equations of the form 


K f(t) = ult) , (3.47) 


where K is an n-th order differential operator which we will take, for sim- 
plicity, to have constant coefficients and such that the coefficient of the term 
of highest degree is 1 ; that is, 


K=D üna D ae ag Ge 


The differential equation (8.47) describes the output response f(t) of the 
system to an input u(t). For the purposes of this section we will say that 
a system is stable if in the absence of any input all solutions are transient; 
that is, 

lim f(t)=0, 


t— o0 


regardless the initial conditions. 


Often one extends the notion of stability to systems for which f(t) remains bounded as 
t — oo: for example, if the solutions oscillate; but we will not do this here. In any case, 
the method we will employ extends trivially to this weaker notion of stability. 
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Function Transform Conditions 
F(t) F(s) convergence 
1 
e” Re(s) > Re(a) 
s—a 
cos wt D = z w € Rand Re(s) > 0 
82 +w 
wW 
sin wt a w € Rand Re(s) > 0 
s 
cosh 6t EENT B €R and Re(s) > |8| 
. p 
sinh 6t 32—82 B ER and Re(s) > IB] 
2 n! 
t FA n=0,1,... and Re(s) >0 
e% f(t) F(s — a) convergence 
t f) (—1)" F®(s) same as for F(s) 
HO f F(a) do same as for F(s) 
f-(t) e *7 F(s) T > 0 and same as for F(s) 
ôt- rT) O(t—T)e 7 none 
n-1 
FOA | sPF(s)— N stk FO) lim f*)(t)e“* = 0 
ban t— o0 
t 
F 
f f(r) dr (s) same as for F(s) 
0 s 


Table 3.1: Some Laplace transforms 


Stability can be analysed using the Laplace transform. In order to see 
this let us take the Laplace transform of the equation (8.47). Letting F(s) 
and U(s) denote the Laplace transforms of f(t) and u(t) respectively, we 
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have 
(s” +a,-18"* ++ +a s +a9)F(s) = U(s) + Pls) , (3.48) 


where P(s) is a polynomial in s of order at most n — 1 depending on the 
initial conditions: f)(0) for k =0,1,...,n—1. In fact, a little bit of algebra 
using equation (8.46) shows that 


n—1 n—i—l 
P(s)= Soi as with p; = D aj+i+1f® (0) , 
i=0 j=0 


with the conventions that an = 1. We will not need its explicit expression, 
however. We can solve for F(s) in the transformed equation (8.48): 
U(s Pie 
F(s) = (s) f (s) 
s” +- + a9 s” +--+ a9 


Notice that the first term in the right-hand side of the equation depends 
on the input, whereas the second term depends on the initial conditions. 
Moreover the common denominator depends only on the differential operator 
K; that is, it is intrinsic to the system. It is convenient to define the function 


1 


7 
OO Fea, 


It is called the transfer function of the system and it encodes a great deal of 
information about the qualitative dynamics of the system. In particular we 
can will be able to characterise the stability of the system by studying the 
poles of the transfer function in the complex s-plane. 

Let us start with the case of a first order equation: 


(D + ao) f(t) = u(t) . 


Taking the Laplace transform and solving for the Laplace transform F(s) of 
f(t) we have 
Us), £0 


Sta sta 


F(s) = 
In the absence of any input (u = 0), the solution of this equation is given by 
f(t) = fOr. 


This solution is transient provided that Re(ao) > 0. This is equivalent to 
saying that the pole —ag of the transfer function 1/(s + ag) lies in the left 
half of the plane. 
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Let us now consider a second order equation: 
(D? + a,D + ag) f(t) = u(t). 
Taking the Laplace transform and solving for F(s), we find 
F(s) = H(s)U(s) + H(s) [(s + a1) f(0) + f'(0)] , (3.49) 


where the transfer function is given by 
1 


ie oe 
(s) s?+a,8+ ao 


The poles of H(s) occur at the zeros of s? + a; s + a9. Two possibilities can 
occur: the zeros are simple and distinct: s+ say, or there is one double zero at 
so. In either case we will decompose the right-hand side of the transformed 
equation (8.49) with u(t) and hence U(s) set to zero, into partial fractions. 
In the case of distinct zeros, we have 

n ee a 


S— sS} 2= sy 


’ 


where A; and A are constants depending on f(0) and f’(0). The transform 
is trivial to invert: 
f(t) = Ai et + Aet , 

which is transient for all A; and A, if and only if Re(s+) < 0; in other words, 
if and only if the poles of the transfer function lie in the left side of the plane. 
On the other hand, if the zero is double, then we have 

Bı Bə 
S — Sọ g (s — 89)? ’ 


F(s) = 


where again Bı and Bə are constants depending on f(0) and f’(0). We can 
invert the transform and find that 


F = Biet + Bate®t ; 


which is transient for all B, and B, if and only if Re(soọ) < 0; so that so lies 
in the left side of the plane. 

In fact this is a general result: a system is stable if all the poles of the 
transfer function lie in the left side of the plane. A formal proof of this 
statement is not hard, but takes some bookkeeping, so we will leave it as an 
exercise for the industrious reader. 

Notice that if one relaxes the condition that the solutions should be tran- 
sient for all initial conditions, then it may happen that for certain types of 
initial conditions non-transient solutions have a zero coefficient. The system 
may therefore seem stable, but only because of the special choice of initial 
conditions. 
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The damped harmonic oscillator 


Stability is not the only property of a system that can be detected by studying 
the poles of the transfer function. With some experience one can detect 
change in the qualitative behaviour of a system by studying the poles. A 
simple example is provided by the damped harmonic oscillator. 

This system is defined by two parameters u and w, both positive real 
numbers. The differential equation which governs this system is 


(D? + 2u D +w?) f(t) = u(t). 


The transfer function is 
1 
s2 + 2u s +w?’ 


s4 = -uyu w?. 


We must distinguish three separate cases: 


H(s) = 


which has poles at 


(a) (overdamped) u > w 
In this case the poles are real and negative: 


w2 


(b) (critically damped) u = w 
In this case there is a double pole, real and negative: s, = s_ = — p. 


H 


(c) (underdamped) u < w 
In this case the poles are complex: 


/ 2 
s= =ü Baw i 


Hence provided that u is positive, the system is stable. 

Suppose that we start with the system being overdamped so that the 
ratio 0 = w/p is less than 1: ọ < 1. As we increase ọ either by increasing 
w or decreasing u, the poles of the transfer function, which start in the 
negative real axis, start moving towards each other, coinciding when ọ = 1. 
If we continue increasing @ so that it becomes greater than 1, the poles move 
vertically away from each other keeping their real parts constant. It is the 
transition from real to complex poles which offers the most drastic qualitative 
change in the behaviour of the system. 
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3.3.5 Application: convolution and the tautochrone 


In this section we discuss a beautiful application of the Laplace transform. 
We also take the opportunity to discuss the convolution of two functions. 


The convolution 


Suppose that f(t) and g(t) are two functions with Laplace transforms F(s) 
and G(s). Consider the product F(s) G(s). Is this the Laplace transform of 
any function? It turns out it is! To see this let us write the product F(s) G(s) 
explicitly: 


F(s) G(s) = ( f * Flu) edu) ( / “ae e= dv) 


We can think of this as a double integral in the positive quadrant of the 
(u, v)-plane: 


ragas l | en st+*) fa) g(v) dudo (3.50) 


If this were the Laplace transform of anything, it would have to be of the 
form 


koda- f * a(t) e*t dt. (3.51) 


Comparing the two equations we are prompted to define t = u +v. In the 
positive quadrant in the (u,v)-axis, t runs from 0 to oo: lines of constant 
t having slope —1. Therefore we see that integrating (u,v) in the positive 
quadrant is the same as integrating (t,v) where t runs from 0 to oo and for 
every t, v runs from 0 to t: 


v 


œ 


[f Meeyauae= f a f e-voa 


In other words, we can rewrite equation (8.50) as 


roc [e f te-a. 
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Comparing with equation (8.51), we see that this equation is true provided 
that 


h(t) = / f(t—v) g(v) dv. 


This means that h(t) is the convolution of f and g. The convolution is often 
denoted f xg: 


(3.52) 


and it is characterised by the convolution theorem: 
L {Ff xg} (s) = F(s) G(s) . (3.53) 


Notice that f xg = gx f. This is clear from the fact that F(s) G(s) = 
G(s) F(s), but can also be checked directly by making a change of variables 
7 = t-— o in the integral in (8.52). 


Abel’s mechanical problem and the tautochrone 


As an amusing application of the convolution theorem for the Laplace trans- 
form, let us consider Abel’s mechanical problem. In short, the problem can 
be described as follows. Consider a bead of mass m which can slide down a 
wire frame under the influence of gravity but without any friction. Suppose 
that the bead is dropped from rest from a height h. Let T(h) denote the time 
it takes to slide down to the ground. If one knows the shape of the wire it 
is a simple matter to determine the function 7(h), and we will do so below. 
Abel’s mechanical problem is the inverse: given the function 7(h) determine 
the shape of the wire. As we will see below, this leads to an integral equation 
which has to be solved. In general integral equations are difficult to solve, but 
in this particular case, the integral is in the form of a convolution, whence 
its Laplace transform factorises. It is precisely this feature which makes the 
problem solvable. 


To see what I mean, consider the following integral equation for the unknown function 
f(b) 
f@=1 +f Pia ends. (3.54) 

o 


We can recognise the integral as the convolution of the functions f(t) and sint, whence 
taking the Laplace transform of both sides of the equation, we have 


1 


1 
F(s) = — + F(s) —— , 
re or aa 
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which we can immediately solve for F(s): 


1 1 
F(s)=-+3, 
S s 


which is the Laplace transform of the function 
f@=1+5t. 


One can verify directly that this function obeys the original integral equation (3.54). 


Figure 3.4: Abel’s mechanical problem 


In order to set up Abel’s mechanical problem, it will prove convenient to 
keep Figure B.4 in mind. We will assume that the wire has no torsion, so 
that the motion of the bead happens in one plane: the (x, y) plane with y the 
vertical displacement and x the horizontal displacement. We choose our axes 
in such a way that wire touches the ground at the origin of the plane: (0,0). 
The shape of the wire is given by a function y = y(x), with y(0) = 0. Let £ 
denote the length along the wire from the origin to the point (x,y = y(«)) 
on the wire. We drop the bead from rest from a height h. Because there is 
no friction, energy is conserved. The kinetic energy of the bead at any time 
t after being dropped is given by 


whereas the potential energy is given by 
V=—-mg(h—y). 


Conservation of energy says that T + V is a constant. To compute this 
constant, let us evaluate this at the moment the bead is dropped, t = 0. 
Because it is dropped from rest, dé/dt = 0 at t = 0, and hence T = 0. 
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Since at t = 0, y = h the potential energy also vanishes and we have that 
T +V =0. This identity can be rewritten as 


m (X) =m), 


from which we can find a formula for dé/dt: 


> Sse: (3.55) 


where we have chosen the negative sign for the square root, because as the 
bead falls, 2 decreases. Now, the length element along the wire is given by 


dl = \/dx? + dy? , 


where dx and dy are not independent since we have a relation y = y(x). 
Inverting this relation gives x as a function of y and we can use this to write 


Nie 


d 2 
de =/1+ @ dy = f(y) dy , (3.56) 


which defines the function f(y). Clearly, f(y) encodes the information about 
the shape of the wire: knowing f(y) for all y allows us to solve for the 
dependence of x on y and viceversa. Indeed, suppose that f(y) is known, 
then solving for dx/dy, we have that 


d 
ay 7 VIP, 


from where we have 
dx = Fy -Idy , (3.57) 


which can then be integrated to find x as a function of y, and by inverting 
this, y as a function of x. 
Let us rewrite equation (8.55) as 


1 


dt = —————_ dl. 
2g (h — y) 
and insert equation (3.56) in this equation, to obtain 
qa- T9) ae. 
2g (h — y) 


Finally we integrate this along the trajectory of the bead, as it falls from 
y =h at t = 0 until y = 0 at t = T(h): 


[ e- a. 


whence 


(3.58) 


This formula gives us how long it takes for the bead to fall along the wire 
from a height h: so if we know the shape of the wire, and hence f(y), we 
can compute T(h) just by integrating. On the other hand, suppose that we 
are given T(h) and we want to solve for the shape of the wire. This means 
solving equation for f(y) and then finding y = y(x) from f(y). The 
latter half of the problem is a first order differential equation, but the former 
half is an integral equation. In general this problem would be quite difficult, 
but because we notice that the integral in the right hand side is in the form 
of a convolution, we can try to solve this by using the Laplace transform. 

Before doing so, however, let us check that we have not made a mistake, 
by testing the integral expression for 7(h) in a some cases where we know 
the answer. Suppose, for instance, that the wire is completely vertical. This 
means that dx/dy = 0, whence f(y) = 1. In this case, equation (8.58) 
simplifies enormously, and we get 


1 h dy 2h 
T(h) = = S 
v2g Jo vh=y g 
as expected from elementary newtonian mechanics. Similarly, if the wire is 


inclined 6 degrees from the horizontal, so that y(x) = tan@ x. Then dz/dy = 
cot 0, and hence f(y) is given by 


f(y) = 4/14 (E) = VIF TOTP = seo. 


Therefore, the time taken to fall is simply csc 0 times the vertical time of fall: 


2h 
T(h)= csch] — , 
(h) E 


which, since csc(7/2) = 1, agrees with the previous result. 
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Let us now take the Laplace transform of both sides of equation (3.58), 
thinking of them both as functions of h; in other words, the Laplace transform 
F(s) of a function g(h) is given by 


(This is Shakespeare’s theorem yet again!) Applying this to equation (8.58), 


we find i r 
T(s) = —— F(s) L 4 — ẹ (8), 
(s) OERO 
where T (s) is the Laplace transform of the function 7, and F(s) is the Laplace 
transform of the function f. The Laplace transform of the function 1/ Vh 
was worked out in the problems and the result is: 


£ [> (ene Vz. (3.59) 


We can then solve for F(s) in terms of T'(s) as follows: 


F(s) = 12 J/sT(s) , (3.60) 


which can in principle be inverted to solve for f, either from Table B.I] or, if 
all else fails, from the inversion formula (8.39). 

Let us apply this to solving for the shape that the wire must have for it to 
have the curious property that no matter what height we drop the bead from, 
it will take the same amount of time to fall to the ground. Such a shape is 
known as the tautochrone. Clearly, the tautochrone is such that T(h) = 7 is 
constant, whence its Laplace transform is T(s) = 7T/s. Into equation (8.60), 


we get 
2g T T |T 
F — — m 2 — — 
(= y 2v vats, 


where we have rewritten it in a way that makes it easy to invert. From 
equation (3.59) we immediately see that 


To reconstruct the formula for the shape of the wire, we apply equation 


(8.57) to obtain 


which can be integrated to 


2Qgr? 
Y Jəgr2 1 v27 -y 
J. A ~~ 1dy= f L gy, (3.61) 
0 am Y 0 JY 


Notice that the constant of integration is fixed to 0 since the wire is such 
that when z = 0, y = 0. This integral can be performed by a trigonometric 
substitution. First of all let us define 


29r? 
2 3 


b= 


T 
and let y = b (sin ¢)’, so that 


dy = 2b sing cos ọdọ . 
Into the integral in (8.61), we find 


oly) b 
r= f 2b (cos ¢)? dé = 2(y) + sin2¢(y)] , 
0 


where 
y = b(sin oly)? = 5 (1 — cos 26(y)) . 


If we define a = b/2 and 0 = 2¢(y), we have the following parametric repre- 
sentation for the curve in the (x,y) plane defining the wire: 


x = a(0 + sin 0) and y=a(l—cos@) . 


This curve is called a cycloid. It is the curve traced by a point in the rim of 
a circle of radius a rolling upside down without sliding along the line y = a, 
as shown in the Figure 


Figure 3.5: The cycloid 
The cycloid also has another interesting property: it is the brachis- 


tochrone, namely the shape of the wire for which the time 7(h) is minimised. 
Although the proof is not hard, we will not do it here. 
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3.3.6 The Gamma and Zeta functions 


This section falls outside the main scope of these notes, but since it allows a 
glimpse at some of the deepest and most beautiful aspects of mathematics, 
I could not resist the temptation to include it. 

It is possible to consider the Laplace transform of complex powers tř, 
with z some complex number. We see that 


ee) 1 ee) 
zZ Ta z —st _ Zu 
eo =f te a= | we "du, 


0 


where we have changed the variable of integration from t to u = st. Let us 
introduce the Euler Gamma function 


(3.62) 


which converges for Re(z) > 0. Then we have that 


spioen e 


Comparing with equation (8.41), we see that T'(n + 1) = n!, whence we 
can think of the Gamma function as a way to define the factorial of a com- 
plex number. Although the integral representation is only defined 
for Re(z) > 0 it is possible to extend T(z) to a holomorphic function with 
only isolated singularities in the whole complex plane: simple poles at the 
nonpositive integers. 

To see this notice that for Re(z) > 0, we can derive a recursion relation 
for T(z) extending the well-known n! = n(n — 1)! for positive integers n. 
Consider 


r(z+1) a te "di. 
0 


Integrating by parts, 


oO 


r(z+1)= | zt edt — t e 
0 


= zI(z)+limt?e™. 
0 t—0 


Provided that Re(z) > 0, the boundary term vanishes and we have 
T(z+1)=zT(2). (3.63) 
Turning this equation around, we have that 


n eee 


zZ 
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Since [(1) = 1, which incidentally justifies the usual claim that 0! = 1, we 
see that T(z) has a simple pole at z = 0 with residue 1. Using this recursion 
relation repeatedly, we see that T(z) has simple poles at all the nonpositive 
integers, with residue 
= 
Res(T; —k) = ay 
and these are all the singularities. 

The Gamma function is an extremely important function in mathematics, 
not least of all because it is intimately related to another illustrious function: 
the Riemann Zeta function ¢(z), defined for Re(z) > 1 by the converging 
series 


To see the relation notice that 


oo 1 oo T 
| 71 ent dt = F uč! et du = (z) l 
0 n? Jo n7 


where we have changed variables of integration from t to u = nt. Summing 
both sides of this identity over all positive integers n, we have, on the one 


hand e 
Y nece), 


n 


and on the other 


oo oo oo oo oo 1 
te ™dt= | EN e™dt= I e dt . 


where we have interchanged the summation inside the integral, and summed 
the geometric series. (This can be justified, although we will not do so here.) 
As a result we have the following integral representation for the Zeta function 


oo 47! 
=g eat. 


The only source of singularities in the integral is the zero of e' — 1 at the 
origin, so we can split the integral into two as follows: 


1 1 471 oo +! 
a= l see f Fal. 


It is possible to show that T(z) has no zeros, whence 1/T (z) is entire. Simi- 
larly, the second integral Pi is also entire since the integrand is continuous 


240 


there. Hence the singularity structure of the Zeta function is contained in 
the first integral. We can do a Laurent expansion of the integrand around 
oe l 1 1 t 
Sa | 3 
ga 2° h 
where only odd powers of t appear after the first. Therefore integrating 
termwise, which we can do because Laurent series converge uniformly, we 


have that ; : 
t 1 11 1 1 
dt = z TEn 3.64 

rs z=1 32 Bal R 


where the terms which have been omitted are all of the form ag/(z+ k) where 


k is a positive odd integer. This shows that the integral i has simple poles 
at z = 1, z = 0, and z = —k with k a positive odd integer. Because the 
integral is multiplied by 1/T (z), and the Gamma function has simple poles 
at the nonnegative integers we see immediately that 


e ¢(z) has a simple pole at z = 1 with residue ['(1) = 1, and is analytic 
everywhere else; and 


e ¢(—2n) = 0 where n is any positive integer: these are the zeros of 
1/T (z) which are not cancelled by the poles in (8.64). 


The celebrated Riemann hypothesis states that all other zeros of ¢(z) occur 
in the line Re(z) = 3. Now that Fermat’s Last Theorem has been proven, 
the Riemann hypothesis remains the most important open problem in math- 
ematics today. 

The importance of settling this hypothesis stems from the intimate rela- 
tionship between the Zeta function and the theory of numbers. The key to 


this relationship is the following infinite product expansion for ¢(z), valid for 


Re(z) > 1: 
ca" ELC) 


primes 
p 


which follows from the unique factorisation of every positive integer into a 
product of primes. To see this notice that since, for Re(z) > 1, one has 


then it follows that 


1 1 1 1 141 
yo at gare ar ey ee 


whence 


1 ee ae ee | 
Tee sip ee ee eee 
( IE 3 be P | OF 


In other words we have in the right-hand side only those terms 1/n* where 
n is odd. Similarly, 


where now we have in the right-hand side only those terms 1/n* where n is 
not divisible by 2 or by 3. Continuing in this fashion, we have that 


I] (1--) C(z)=1. 


primes 
p 


By the way, this shows that ¢(z) has no zeros for Re(z) > 1. 

The Zeta function and its generalisations also play a useful role in physics: 
particularly in quantum field theory, statistical mechanics, and, of course, 
in string theory. In fact, together with the heat kernel, introduced in the 
problems, the (generalised) Zeta function proves invaluable in computing 
determinants and traces of infinite-dimensional matrices 


Areas of spheres 


As a minor application of the Gamma function, let us compute the area of a 
unit sphere in n dimensions, for n > 2. 

What do we mean by a unit sphere in n dimensions? The unit sphere 
in n dimensions is the set of points in n-dimensional euclidean space which 
are a unit distance away from the origin. If we let (£1, £2,..., £n) be the 
coordinates for euclidean space, the unit sphere is the set of points which 
satisfy the equation 


n 

Dat 20 2 — 
J zi 2G +r +... 251. 
i=1 


In n = 2 dimensions, the unit “sphere” is a circle, whereas in n = 3 dimen- 
sions it is the usual sphere of everyday experience. For n > 3, the sphere is 
harder to visualise, but one can still work with it via the algebraic description 
above. 

What do we mean by its area? We mean the n — 1-dimensional area: so 
if n = 2, we mean the circumference of the circle, and if n = 3 we mean 
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the usual area of everyday experience. Again it gets harder to visualise for 
n > 3, but one can again tackle the problem algebraically as above. 

Clearly every point in n-dimensional space lies on some sphere: if it is a 
distance r away from the origin then, by definition, it lies on the sphere of 
radius r. There are an uncountable number of spheres in euclidean space, 
one for every positive real number. All these spheres taken together with the 
origin (a “sphere” of zero radius) make up all of euclidean space. A simple 
scaling argument shows that if we double the radius, we multiply the area of 
the sphere by 2”~'. More generally, the area of the sphere at radius r will 
be r™~! times the area of the unit sphere. Therefore the volume element in 
n-dimensions is 


d’a =r""" dr dQ. , 


where dQ) is the area element of the unit sphere. We will now integrate 
the function exp(—r?) over all of the euclidean space. We can compute this 
integral in either of two ways. On the one hand, 


tafe feras 
2 / x f eth dedes- dan 
(J tan) ( [7 ae) (f Aa) 
= o e=? ae) ; 


which is computed to give (y/r)” after using the elementary gaussian result: 
Jo. exp(—a?) dx = yr. On the other hand, 


r= fo fete drao = ea) (ffa) . 


The integral of dQ is simply the area A of the unit sphere, which is what 
we want to calculate. The radial integral can be calculated in terms of the 


Gamma function after changing the variable of integration from r to t = r?: 


1 er prot dr = T et 1 yln-2)/2 dt = D'(n/2) f 
0 0 2 


Equating both ways to compute the integral, we arrive at the following for- 
mula for the area A(n) of the unit sphere in n dimensions: 


27/2 
AQ) = Fay (3.65) 
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To see that this beautiful formula is not obviously wrong, let us see that 
it reproduces what we know. For n = 2, by the area of the unit sphere we 
mean the circumference of the unit circle, that is 27. In n = 3, we expect the 
area of the standard unit sphere and that is 47. Let us see if our expectations 
are born out. According to the formula, 


as expected. For n = 3 the formula says 


2 73/2 


L i 


We can compute the half-integral values of the Gamma function as follows. 
First we have that m 
r(}) = I pea. 
0 


Changing variables to t = u?, we have 


re) =2 f oP du= | e” du=Vr. 
0 = 


OO 


Now using the recursion relation (8.63), we have that 


2k—12k-3 1 2k— 1)!! 
P(k+3) = tra) =! — 


2 


I 
=| 
g 
> 
D 
5 
6 


In particular, ['(3) 


as expected. 


How about for n = 1? This case is a little special: in one dimension the unit sphere 
consists of the points +1. So that it is a zero-dimensional set. Is there an intrinsic notion 
of area for a zero-dimensional set? If we evaluate the above formula for A(n) at n = 1, 
we get an answer: A(1) = 2, which is counting the number of points: in other words, 
zero-dimensional area is simply the cardinality of the set: the number of elements. This 
is something that perhaps we would not have expected. As someone said once, some 
formulae are more clever than the people who come up with them. 


Now that we trust the formula, we can compute a few more values to 
learn something new. First let us simplify the formula by evaluating the 
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Gamma function at the appropriate values. Distinguishing between odd and 
even dimensions, we find 


2 £ 
e , for n = 2L, and 
A(n) = ott at (3.66) 
Qe! ; for n = 20 + 1. 
The next few values are 
81? 1673 rÍ 
A(4)=27°, A(5)= ar A(6)=n°, A(7)= ~ A(8) = T 


In case you are wondering whether this is at all useful, it actually comes in handy when 
normalising electromagnetic fields in higher-dimensional field theories so that they have 
integral fluxes around charged objects (e.g., branes and black holes). 


Let us end with another nice formula. How about the n-dimensional 
volume V (n) of the unit ball, i.e., the interior of the unit sphere? We can 
compute this by integrating the areas of the spheres from radius 0 to radius 
1. The area of the sphere of radius r will be r”~! times the area of the sphere 
of unit radius, so that the volume is then 


1 
A 
V(n) = | A(n)r”! dr = a) ; 
0 n 
Using the formula (8.65), we see that 
9_r/2 g” g” 


V) = Te ~ HERD) ar 


where we have used the recursion formula (8.63). Because the the unit sphere 
is inscribed inside the cube of length 2, the ratio of the volume of the unit 
ball to that of the cube circumscribing it is given by 


V(n) ql? 


If we were to plot this as a function of n we notice that it starts at 1 for 
n = 1 and then decreases quite fast, so that the ball takes up less and less of 
the volume of he cube which circumscribes it. 
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